Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CompactionE2EIdempotencyTest.test_basic_compaction: last offset off by one error #8698

Closed
rystsov opened this issue Feb 7, 2023 · 3 comments
Assignees
Labels
area/transactions ci-disabled-test ci-failure ci-ignore Automatic ci analysis tools ignore this issue kind/bug Something isn't working sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low.

Comments

@rystsov
Copy link
Contributor

rystsov commented Feb 7, 2023

Sometimes when a consumer queries last offset (least upper bound)

consumer.seekToEnd(tps);
long lub = consumer.position(tp);

and reads until it reaches it

consumer.seekToBeginning(tps);
while (consumer.position(tp) < lub) {
    consumer.poll(...);
}

it gets stuck because the position stops advancing.

@rystsov
Copy link
Contributor Author

rystsov commented Feb 7, 2023

https://buildkite.com/redpanda/redpanda/builds/22632#01862925-396e-479f-9ea0-86aaa7b2cc52

In this example (see initial_cleanup_policy=compact.workload=Workload.TX_UNIQUE_KEYS/7/) we see that the position of consumers reached LUB (30120, 30541); but it didn't happen with the stuck consumer (36979 < 36980). The problem is with position and not with the data: all consumers read all written data (36978, 30112, 30535).

[DEBUG - 2023-02-07 00:01:11,420 - compacted_verifier - _remote_info - lineno:245]: Received {
    "total_writes": 74980,
    "total_reads": 74980,
    "min_writes": 23010,
    "min_reads": 23010,
    "partitions": [
        {
            "partition": 0, 
            "end_offset": 36980,
            "read_offset": 36978,
            "read_position": 36979,
            "written_offset": 36978,
            "consumed": false
        },
        {
            "partition": 1,
            "end_offset": 30120,
            "read_offset": 30112,
            "read_position": 30120,
            "written_offset": 30112,
            "consumed": true
        },
        {
            "partition": 2,
            "end_offset": 30541,
            "read_offset": 30535,
            "read_position": 30541,
            "written_offset": 30535,
            "consumed": true
        }
    ]
}

The issue was found with the compaction_e2e_test test which:

  1. writes data
  2. waits until it's compacted
  3. reads data

What's interesting is that each consumer reads its own partition and the partition of the failed consumer experienced re-election after the first step.

rystsov added a commit to rystsov/redpanda that referenced this issue Feb 7, 2023
rystsov added a commit to rystsov/redpanda that referenced this issue Feb 8, 2023
The commit adds:
 - validation of the written offset monotonicity
 - tracking of the last written offset (writtenOffset)
 - tracking of the current position during consumption (readPosition)
 - adds stop condition for the consumption when read offset reaches last
   written offset

The latter is necessary to bypass redpanda-data#8698 issue
rystsov added a commit to rystsov/redpanda that referenced this issue Feb 9, 2023
The commit adds:
 - validation of the written offset monotonicity
 - tracking of the last written offset (writtenOffset)
 - tracking of the current position during consumption (readPosition)
 - adds stop condition for the consumption when read offset reaches last
   written offset

The latter is necessary to bypass redpanda-data#8698 issue
@rystsov
Copy link
Contributor Author

rystsov commented Feb 10, 2023

the test is disable by relaxation of the invariant (all written records are consumed vs it's possible to consume up to an offset returned by list offsets) instead of adding ok_to_fail, see 3e4a479

andijcr pushed a commit to andijcr/redpanda that referenced this issue Feb 10, 2023
The commit adds:
 - validation of the written offset monotonicity
 - tracking of the last written offset (writtenOffset)
 - tracking of the current position during consumption (readPosition)
 - adds stop condition for the consumption when read offset reaches last
   written offset

The latter is necessary to bypass redpanda-data#8698 issue
@redpanda-data redpanda-data deleted a comment from NyaliaLui Jun 14, 2023
@redpanda-data redpanda-data deleted a comment from dotnwat Jun 14, 2023
@rystsov rystsov changed the title CI Failure (consumers haven't finished) in CompactionE2EIdempotencyTest.test_basic_compaction CompactionE2EIdempotencyTest.test_basic_compaction: last offset off by one error Jun 14, 2023
@rystsov rystsov added sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low. and removed sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages labels Jun 14, 2023
@rystsov rystsov added the ci-ignore Automatic ci analysis tools ignore this issue label Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/transactions ci-disabled-test ci-failure ci-ignore Automatic ci analysis tools ignore this issue kind/bug Something isn't working sev/medium Bugs that do not meet criteria for high or critical, but are more severe than low.
Projects
None yet
Development

No branches or pull requests

2 participants