Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[transaction] Fix group_*_tx events compaction #6086

Merged
merged 3 commits into from
Aug 22, 2022

Conversation

VadimPlh
Copy link
Contributor

@VadimPlh VadimPlh commented Aug 18, 2022

Cover letter

Problem is group_*_tx contains only producer_id in key, so compaction save only last records for this events.
We need only save in logs consumed offset after commit_txs. For this we can write store_offset event together with group_commit_tx.

New Problem: this 2 events contains different record batch type. So we can not put it in one batch and write on disk atomically. But it is not a problem, because client got ok for commit_request (see tx_gateway_frontend). So redpanda will eventually finish commit and complete write for both this events.

Fixes #5163

Backport Required

  • not a bug fix
  • papercut/not impactful enough to backport
  • v22.2.x
  • v22.1.x
  • v21.11.x

UX changes

  • none

Release notes

Bug Fixes

  • Enable compaction for group_*_tx log records

@VadimPlh
Copy link
Contributor Author

VadimPlh commented Aug 18, 2022

Fail

rptest.tests.controller_upgrade_test

Validates that cluster is operational when upgrading controller log

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 51, in wrapped
    self.redpanda.raise_on_bad_logs(allow_list=log_allow_list)
  File "/root/tests/rptest/services/redpanda.py", line 1272, in raise_on_bad_logs
    raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-5(1) example="ERROR 2022-08-18 11:27:53,719 [shard 1] r/heartbeat - heartbeat_manager.cc:284 - cannot find consensus group:28">

@VadimPlh VadimPlh changed the title Fix tx group compaction [transaction] Fix group_*_tx events compaction Aug 18, 2022
Copy link
Contributor

@rystsov rystsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Please comment this PR with a add link to the ci-failure issue for each failing test (one comment per failure); if issue doesn't exist and it isn't caused by this change - create an issue and mark it with pr-blocker label; mark pr blockers as BLOCKER in the comment to this PR (see #6003 as an example).

Small refactor to reuse function to build store_offset batch
It is fix for redpanda-data#5163.
Problem is group_*_tx contains only producer_id in key, so compaction
save only last records for this events. We need only save in logs
consumed offset after commit_txs. For this we can write store_offset
event together with group_commit_tx.

Problem: this 2 events contains different record batch type. So we can
not put it in one batch and write on disk atomically. But it is not a
problem, because client got ok for commit_request (see
tx_gateway_frontend). So redpanda will eventually finish commit and
complete write for both this events.
@VadimPlh
Copy link
Contributor Author

Failure from first run was #5886

Copy link
Contributor

@rystsov rystsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix!

@rystsov rystsov merged commit a2cc309 into redpanda-data:dev Aug 22, 2022
@mmedenjak mmedenjak added kind/bug Something isn't working area/transactions and removed area/redpanda labels Aug 23, 2022
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/transactions kind/bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compaction for group_* should not lose info about previous commit_tx events
4 participants