Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (Bad Logs: cannot take snapshot before raft start offset) in ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy #14220

Closed
rockwotj opened this issue Oct 17, 2023 · 1 comment · Fixed by #15945
Labels
area/cloud-storage Shadow indexing subsystem ci-failure kind/bug Something isn't working

Comments

@rockwotj
Copy link
Contributor

https://buildkite.com/redpanda/vtools/builds/9373
https://buildkite.com/redpanda/redpanda/builds/36927

Module: rptest.tests.e2e_shadow_indexing_test
Class: ShadowIndexingWhileBusyTest
Method: test_create_or_delete_topics_while_busy
Arguments: {
    "short_retention": true,
    "cloud_storage_type": 1
}
test_id:    ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy
status:     FAIL
run time:   891.477 seconds

<BadLogLines nodes=ip-172-31-11-252(1) example="ERROR 2023-09-08 09:49:04,309 [shard 2:main] cluster - [{kafka/topic-nspqyicand/10} (log_eviction_stm.snapshot)] - log_eviction_stm.cc:130 - Error occurred when attempting to write snapshot: std::logic_error (Can not take snapshot of a state from before raft start offset. Requested offset: 5110, start offset: 5275)">
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 142, in wrapped
    redpanda.raise_on_bad_logs(
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1244, in raise_on_bad_logs
    raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=ip-172-31-11-252(1) example="ERROR 2023-09-08 09:49:04,309 [shard 2:main] cluster - [{kafka/topic-nspqyicand/10} (log_eviction_stm.snapshot)] - log_eviction_stm.cc:130 - Error occurred when attempting to write snapshot: std::logic_error (Can not take snapshot of a state from before raft start offset. Requested offset: 5110, start offset: 5275)">
@rockwotj rockwotj added ci-failure kind/bug Something isn't working labels Oct 17, 2023
@dotnwat
Copy link
Member

dotnwat commented Oct 18, 2023

@VladLazar @bharathv @mmaslankaprv do you think this is storage or replication team?

@dotnwat dotnwat added the area/cloud-storage Shadow indexing subsystem label Dec 15, 2023
mmaslankaprv added a commit to mmaslankaprv/redpanda that referenced this issue Jan 4, 2024
It is perfectly possible that the install snapshot request will reach
the follower right before the `log_eviction_stm` asks for the snapshot
creation. In this case an `stm_manager` would do the check and throw an
exception informing that the snapshot can not be taken. In order to
handle the situation gracefully added a check in log eviction stm to
skip taking snapshot if start offset already progressed.

Fixes: redpanda-data#14220

Signed-off-by: Michal Maslanka <michal@redpanda.com>
mmaslankaprv added a commit to mmaslankaprv/redpanda that referenced this issue Jan 19, 2024
It is perfectly possible that the install snapshot request will reach
the follower right before the `log_eviction_stm` asks for the snapshot
creation. In this case an `stm_manager` would do the check and throw an
exception informing that the snapshot can not be taken. In order to
handle the situation gracefully added a check in log eviction stm to
skip taking snapshot if start offset already progressed.

Fixes: redpanda-data#14220

Signed-off-by: Michal Maslanka <michal@redpanda.com>
(cherry picked from commit 4433851)
ballard26 pushed a commit to ballard26/redpanda that referenced this issue Jan 27, 2024
It is perfectly possible that the install snapshot request will reach
the follower right before the `log_eviction_stm` asks for the snapshot
creation. In this case an `stm_manager` would do the check and throw an
exception informing that the snapshot can not be taken. In order to
handle the situation gracefully added a check in log eviction stm to
skip taking snapshot if start offset already progressed.

Fixes: redpanda-data#14220

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem ci-failure kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants