Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: add topic_delete_unavailable_test, tweak tiered storage topic deletion order #7460

Merged
merged 4 commits into from
Jan 4, 2023

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Nov 23, 2022

Rebase + merge after #7547

A new test covers deleting a tiered storage topic while the
S3 backend is unavailable.

The controller behavior in this situation is to bypass the topic, not retry applying the controller log message, so in order to avoid holding up local deletion we must do the local deletion first. The resulting behavior is that objects in S3 may be orphaned (this PR doesn't cause that, it just avoids the local data being left behind as well).

Related: #7433

Backports Required

  • none - not a bug fix
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v22.3.x
  • v22.2.x
  • v22.1.x

UX Changes

None

Release Notes

  • none

@jcsp jcsp added kind/enhance New feature or request area/tests area/cloud-storage Shadow indexing subsystem labels Nov 23, 2022
@jcsp jcsp requested a review from VladLazar November 23, 2022 11:43
@jcsp jcsp changed the title tests: add topic_delete_unavailable_test tests: add topic_delete_unavailable_test, tweak tiered storage topic deletion order Nov 23, 2022
VladLazar
VladLazar previously approved these changes Nov 24, 2022
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. A couple very minor nits and a question.

objects_before = set(
self.redpanda.s3_client.list_objects(
self.si_settings.cloud_storage_bucket, topic=self.topic))
assert (len(objects_before) > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Extra set of parens around logical condition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned this up while rebasing

Comment on lines 164 to 168
after_keys = set([
o.Key for o in self.redpanda.s3_client.list_objects(
self.si_settings.cloud_storage_bucket, topic=next_topic)
])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can pass an iterator into the set constructor, but it really doesn't matter here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned this up while rebasing

self._populate_topic(next_topic)
after_keys = set([
o.Key for o in self.redpanda.s3_client.list_objects(
self.si_settings.cloud_storage_bucket, topic=next_topic)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised you don't have to wait here. Does an upload happen quickly enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

populate_topic includes a wait for local segments to be trimmed, so we are guaranteed to already have objects in the bucket.

@jcsp
Copy link
Contributor Author

jcsp commented Nov 24, 2022

/ci-repeat 1 skip-unit dt-repeat=20 tests/rptest/tests/topic_delete_test.py::TopicDeleteCloudStorageTest

Unavailability of the S3 backend should not prevent cleanup
of local objects.
Convenient when a test wants the objects for
a particular topic.
This tests deleting a tiered storage topic while the
S3 backend is unavailable.

Related: redpanda-data#7433
This affected the new topic_delete_unavailable_test but
also existing tests.

Test intended to write 10MB of data per partition (i.e.
10 1MB segments), but was actually writing 10MB total
across 3 partitions.  wait_for_segments_removal(5) was
proceeding immediately because partitions only had
~3 partitions to begin with.
@jcsp
Copy link
Contributor Author

jcsp commented Dec 19, 2022

Rebased for conflict.

@jcsp jcsp requested a review from VladLazar December 23, 2022 14:06
@jcsp
Copy link
Contributor Author

jcsp commented Dec 23, 2022

@VladLazar this should be good for a re-review after the holidays.

Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
CI failure was #7874 which was fixed by #7949.


# Wait for segments evicted from local storage
for i in range(0, 3):
for i in range(0, self.partition_count):
wait_for_segments_removal(self.redpanda, topic_name, i, 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wait_for_segments_removal should fail if the initial number of segments is less than the final desired count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it would be nice, but I think it's going to be racy unless the test is very confident that the segment count cannot end up lower than the target. I have limited trust in most of the tests that assert segment counts as a proxy for data retention

@jcsp
Copy link
Contributor Author

jcsp commented Jan 4, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem area/redpanda area/tests kind/enhance New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants