Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] S3 Multi-part upload fails for remote cluster state #14808

Closed
soosinha opened this issue Jul 18, 2024 · 0 comments
Closed

[BUG] S3 Multi-part upload fails for remote cluster state #14808

soosinha opened this issue Jul 18, 2024 · 0 comments
Assignees

Comments

@soosinha
Copy link
Member

soosinha commented Jul 18, 2024

Describe the bug

When s3 is used as a backing store for remote cluster state, the multi part upload of remote state files fails with the below error.

[2024-07-16T12:53:45,471][ERROR][o.o.g.r.RemoteClusterStateService] [7c09ef8cc274078bab152a013b5cbb55] Exception during transfer of Metadata Fragment to Remote nodes
org.opensearch.gateway.remote.RemoteStateTransferException: nodes, failed entity:org.opensearch.gateway.remote.model.RemoteDiscoveryNodes@1c832901
	at org.opensearch.gateway.remote.RemoteClusterStateAttributesManager.lambda$getActionListener$2(RemoteClusterStateAttributesManager.java:106)
	at org.opensearch.core.action.ActionListener$1.onFailure(ActionListener.java:90)
	at org.opensearch.repositories.s3.S3BlobContainer.lambda$createFileCompletableFuture$7(S3BlobContainer.java:320)
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Failed to send multipart upload requests.
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
	at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)
	at org.opensearch.repositories.s3.async.AsyncTransferManager.handleException(AsyncTransferManager.java:326)
	... 61 more
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Request content was only 177910 bytes, but the specified content-length was 5288374 bytes.
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
	at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:223)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:218)
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:182)
	... 24 more
Caused by: java.lang.IllegalStateException: Request content was only 177910 bytes, but the specified content-length was 5288374 bytes.
	at software.amazon.awssdk.http.nio.netty.internal.NettyRequestExecutor$StreamedRequest$1.onComplete(NettyRequestExecutor.java:479)
	at software.amazon.awssdk.utils.async.SimplePublisher.doProcessQueue(SimplePublisher.java:275)
	at software.amazon.awssdk.utils.async.SimplePublisher.processEventQueue(SimplePublisher.java:224)

When s3 async upload is invoked, an IndexInput is passed which is created using the serialized bytes(code ref). Internally in the s3 plugin, when the parts are initialized for multi part upload, they set the file pointer to the location in IndexInput where it should start reading the bytes. But since the backing IndexInput is the same, the file pointer gets set to the last part. Now when the s3 client starts to read, only one of the parts will be able to read and but will face the issue with the content length mismatch. The other parts will not even be able to read any byte as the file pointer gets set to the last location in IndexInput.

Related component

Cluster Manager

To Reproduce

  1. Create a remote state publication enabled cluster
  2. Keep adding nodes to the cluster, so that the size of DiscoveryNodes in cluster state breaches 5 MB.
  3. When the size reaches 5 MB, s3 plugin tries to perform multi part upload.
  4. Check the logs to see upload failure exception.

Expected behavior

Multi part upload should work correctly

Solution

When s3 async upload is invoked, there should be new instance of IndexInput created in the stream supplier function.

Additional Details

Plugins
s3 plugin

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):
OS 2.15

Additional context
Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

No branches or pull requests

2 participants