Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_storage: check that we have serialized the whole manifest #6507

Merged
merged 1 commit into from
Sep 22, 2022

Conversation

ztlpn
Copy link
Contributor

@ztlpn ztlpn commented Sep 22, 2022

Cover letter

It turns out that if the underlying buffer object throws (e.g. a bad_alloc exception), std::ostream swallows the exception and sets the "badbit". If we don't check it, this can lead to the serialized manifest being truncated and to corrupt manifests being uploaded to the cloud storage. To prevent that we check that std::ostream is in a good state after serializing manifests.

More info: https://en.cppreference.com/w/cpp/io/ios_base/iostate

Backport Required

  • v22.2.x
  • v22.1.x
  • v21.11.x

UX changes

none

Release notes

Bug fixes

  • Fix possible shadow indexing manifest corruption under memory pressure.

It turns out that if the underlying buffer object throws (e.g. a
bad_alloc exception), std::ostream swallows the exception and sets the
"badbit". If we don't check it, this can lead to the serialized manifest
being truncated and to corrupt manifests being uploaded to the cloud
storage. To prevent that we check that std::ostream is in a good state
after serializing manifests.

More info: https://en.cppreference.com/w/cpp/io/ios_base/iostate
Copy link
Contributor

@Lazin Lazin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ztlpn
Copy link
Contributor Author

ztlpn commented Sep 22, 2022

/backport v22.2.x

@vbotbuildovich
Copy link
Collaborator

The pull request is not merged yet. Cancelling backport...

Workflow run logs.

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that if the underlying buffer object throws (e.g. a
bad_alloc exception), std::ostream swallows the exception and sets the
"badbit". If we don't check it, this can lead to the serialized manifest
being truncated and to corrupt manifests being uploaded to the cloud
storage. To prevent that we check that std::ostream is in a good state
after serializing manifests.

oh my. good catch. should we use a different serialization method?

@ztlpn ztlpn merged commit 6c177d4 into redpanda-data:dev Sep 22, 2022
@ztlpn
Copy link
Contributor Author

ztlpn commented Sep 22, 2022

/backport v22.2.x

@ztlpn
Copy link
Contributor Author

ztlpn commented Sep 22, 2022

@dotnwat I guess implementing a rapidjson stream that wraps iobuf directly is not that hard. BTW it writes json one character at a time. I hope that iobuf is okay with that and doesn't create excessive number of fragments.

@dotnwat
Copy link
Member

dotnwat commented Sep 23, 2022

I hope that iobuf is okay with that and doesn't create excessive number of fragments.

it should be efficient for this case and use existing free space in the last fragment

@mmedenjak mmedenjak added kind/bug Something isn't working area/cloud-storage Shadow indexing subsystem and removed area/redpanda labels Sep 23, 2022
@ztlpn ztlpn deleted the fix-manifest-corruption branch November 27, 2023 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem kind/bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants