Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v23.1.x] cloud_storage: Prevent segment reuploads from adjacent segment merger #9845

Conversation

Lazin
Copy link
Contributor

@Lazin Lazin commented Apr 5, 2023

Backport #9657

This PR fixes #9651

Currently, the segment_collector is initialised using desired size of the segment reupload. The adjacent segment merger scans the manifest and finds the right upload candidate (a series of small segments). Then it calculates its size and uses it to create segment_collector. This is not correct because if the last segment is not aligned perfectly the collector will return a result without one last segment. This last segment will overflow the size target and be discarded.

With some configurations (close min and target values for adjacent segment merger) this can lead to a situation when the segment gets re uploaded multiple times.

To avoid this several things were added:

  • The adjacent_segment_merger checks the size of the segment run that it gets out of the segment_collector before uploading. If the size is smaller it discards the result.
  • The segment_collector is updated to accept extra parameter which is used as a target offset for the end of the offset range. If this parameter is set the segment _collector will continue scanning until it will reach the segment that contains the target offset. Then it will adjust the upload so this last segment won't be fully uploaded. Instead, it will be uploaded up to the target offset.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

Bug Fixes

  • Improve precision of the adjacent segment merger to avoid unnecessary segment reuploads

Lazin and others added 4 commits April 5, 2023 14:37
It is possible that the reuploaded segment is smaller than predicted. In
this case it's not safe to reupload it because adjacent segment merging
may try to reupload it second time. This commit blocks reupload if the
size doesn't match the result of the manifest scan.

(cherry picked from commit dc855e2)
Add new optional parameter to the c-tor. The parameter accepts inclusive
end offset of the offset range. The segment_collector is forced to reach
the target end offset if it's set. This allows adjacent_segment_merger
to reupload offset ranges more precisely since it's no longer allowed to
have some slack.

(cherry picked from commit 441fb14)
The segment_collector calculates the size of the segment sequence during
the initial scan and then it uses it to produce an upload candidate. The
'make_upload_candidate' method is a future with several scheduling
points. If the size of the last segment changes during the call the
method will return incorrect content length. This commit fixes this by
caching the size of the last segment.

The problem only affect adjacent segment reuploads. It shouldn't affect
compacted segment reupload because the last compacted segment is always
rolled and its size can't change.

(cherry picked from commit ccb0e0a)
@Lazin Lazin merged commit 2b5a423 into redpanda-data:v23.1.x Apr 5, 2023
@BenPope BenPope added this to the v23.1.7 milestone Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants