Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-Memory Buffered S3 OutputStream #392

Closed
sgarfinkel opened this issue May 21, 2022 · 3 comments · Fixed by #400
Closed

In-Memory Buffered S3 OutputStream #392

sgarfinkel opened this issue May 21, 2022 · 3 comments · Fixed by #400
Labels
component: s3 S3 integration related issue type: enhancement Smaller enhancement in existing integration
Milestone

Comments

@sgarfinkel
Copy link
Contributor

Type: Feature

Is your feature request related to a problem? Please describe.
The S3Resource in the old spring-cloud-aws used to return an OutputStream that would buffer the content, and send the content (when it exceeded the buffer capacity), to S3 as a multipart upload. This is particularly useful for large streaming files where the user may not want to store the entire file on the filesystem. The current OutputStream(s) all write the content to an intermediate file on disk before sending it to S3.

Describe the solution you'd like
Implement an InMemoryBufferedS3OutputStream which uses a ByteBuffer to store written bytes. Flush the buffer (starting a multipart upload, if not yet started) and send the content of the buffer as each chunk of the multipart upload. Complete the multipart upload when close is called.

Describe alternatives you've considered
There isn't really one in the current implementation. If AbstractTempFileS3OutputStream used Path instead of File, there might be a way to do it with a custom filesystem implementation.

@github-actions github-actions bot added type: enhancement Smaller enhancement in existing integration status: waiting-for-triage Team has not yet looked into this issue labels May 21, 2022
@sgarfinkel sgarfinkel changed the title In-Memory Buffered S3 Output Stream In-Memory Buffered S3 OutputStream May 21, 2022
@maciejwalkowiak
Copy link
Contributor

Agreed that it would be nice to have. The benefit of storing first on disk is that in case of an upload failure, uploading part can be retried as it's possible to rewind the input stream.

At this stage, for v3 we will recommend using https://github.com/CI-CMG/aws-s3-outputstream and adding a custom S3OutputStreamProvider for this library.

We are also waiting for AWS to implement: aws/aws-sdk-java-v2#3128

Also, if you have an input stream you can fall back to using plain AWS SDK which offers uploading data from it.

@maciejwalkowiak maciejwalkowiak added component: s3 S3 integration related issue status: team-discussion Team has to figure out how to proceed and removed status: waiting-for-triage Team has not yet looked into this issue labels May 22, 2022
@sgarfinkel
Copy link
Contributor Author

sgarfinkel commented May 22, 2022

@maciejwalkowiak I have an implementation of this which is thread-safe with ReentrantLock that I can contribute if that would be helpful.

In terms of retry, I’m not sure why it would be an issue If a part fails you can retry it without failing the entire upload. If we buffer and copy to a byte array, we can even parallelize the multipart upload, and retry with a backoff mechanism.

@maciejwalkowiak
Copy link
Contributor

@sgarfinkel create a PR please and lets see!

@maciejwalkowiak maciejwalkowiak removed the status: team-discussion Team has to figure out how to proceed label Jun 3, 2022
@maciejwalkowiak maciejwalkowiak added this to the 3.0.0 M1 milestone Jun 3, 2022
maciejwalkowiak added a commit that referenced this issue Jun 6, 2022
Fixes #392 

Co-authored-by: Maciej Walkowiak <walkowiak.maciej@yahoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: s3 S3 integration related issue type: enhancement Smaller enhancement in existing integration
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants