-
Notifications
You must be signed in to change notification settings - Fork 699
Upgrade to AWS Java SDK v2 #6165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
✅ Deploy Preview for nextflow-docs-staging ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
It is ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome. Made a few minor comments
plugins/nf-amazon/src/main/nextflow/cloud/aws/AwsClientFactory.groovy
Outdated
Show resolved
Hide resolved
plugins/nf-amazon/src/main/nextflow/cloud/aws/nio/S3Client.java
Outdated
Show resolved
Hide resolved
plugins/nf-amazon/src/main/nextflow/cloud/aws/nio/S3Client.java
Outdated
Show resolved
Hide resolved
plugins/nf-amazon/src/main/nextflow/cloud/aws/nio/S3Client.java
Outdated
Show resolved
Hide resolved
plugins/nf-amazon/src/main/nextflow/cloud/aws/nio/S3Client.java
Outdated
Show resolved
Hide resolved
plugins/nf-amazon/src/main/nextflow/cloud/aws/nio/S3Client.java
Outdated
Show resolved
Hide resolved
plugins/nf-amazon/src/main/nextflow/cloud/aws/nio/S3Client.java
Outdated
Show resolved
Hide resolved
I have found an issue with multi-part uploads when uploading large files. I move to draft until I fix it. |
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
…opy through tranfer manager Signed-off-by: jorgee <jorge.ejarque@seqera.io>
@jorgee can you make a quick summary about this effort? any blocking issues? |
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
@pditommaso, this is the summary that I breifly explained to @bentsherman, yesterday. I have been working on these two issues:
The solution that I have pushed in the latest commits is the following. In the case that a custom signer or user agent is specified, Nextflow will configure a Netty async client of the S3 Transfer Manager. A custom signer can be used including the implementation in the classpath and adding the FQDN of the custom signer class in the
Now, I am running the benchmarks to see if there are differences with the changes, but the code is again ready for review. |
What's CRT? 😄 |
CRT is the AWS Common Runtime. It is a C implementations of the AWS Client. |
OK, let's wait the result of the benchmark. However my advice is to focus on merging this PR with the core migration ignoring corner cases such Custom signer and user agent problem, and addressing them if needed in a separate PRs. Regarding the multipart upload problem, is it related with a specific config or a general one? |
Note that with the custom signer, if we don't support it in this PR then we have to deprecate it as it would no longer do anything |
Co-authored-by: Chris Hakkaart <chris.hakkaart@seqera.io> Signed-off-by: Jorge Ejarque <jorgee@users.noreply.github.com>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
I have pushed the changes to use only CRT version deprecating |
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments above.
I'm mainly confused about the use of the S3 transfer manager vs the old aws.client.upload*
config options. The fact that the old options are still used by newOutputStream()
, to a user this essentially means "sometimes we use the old options, sometimes we use the new options". It would be nice to deprecate these old options altogether.
It's also not clear to me how to select the async client vs sync client, as I don't see any explicit config option for this
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
docs/reference/config.md
Outdated
`aws.client.maxConcurrency` | ||
: :::{versionadded} 25.06.0-edge | ||
::: | ||
: The maximum number of concurrent S3 transfers when using the asynchronous S3 client. By default, this setting is determined by `aws.client.targetThroughputInGbps`. Modifying this value can affect the amount of memory used for S3 transfers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a nextflow user I have no idea what's the asynchronous S3. That's not something should be exposed in the documentation. If I have understand correctly this is related to the CRT vs Netty runtime. Is it correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked. Better to document these are using used by the data transfer component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind mentioning the async S3 client if there is also an option for selecting the async / sync client. But if we can make it so that Nextflow only uses one then we don't need to mention it in the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the text to refer to S3 transfer manager client. I have also added this section in the AWS part, to document when sync and async client is used.
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
The sync client is used in all api call except the S3 transfer manager actions ( S3 copies, uploads and downloads). For transfer manager there is no option to use the sync client. I didn't changed all calls to async because it implied a lot of changes in the code |
I have added a subsection in the aws part of the documentation mentioning the most important changes in the v1 to v2 migration |
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
This PR contains the changes to port the Amazon plugin to AWS SDK version 2. Find below the most relevant changes:
S3Client.withForceGlobalBucketAcessEnabled(flag)
. In v2, it set the following flagsS3Client.Builder.crossRegionAccessEnabled(flag)
andS3Configuration.multiRegionAccessEnabled(flag)
.-
AmazonS3Client.getS3AccountOwner()
is not available in SDK v2. It was providing an ID used for checking the file access. In V2, the only way to retrieve the same ID is from a bucket owned by the user. To do it we need to list the buckets and get the owner field in the GetBucketACLResponse. If it is not possible to retrieve the ID because the user does not own any bucket, we perform the following fallback. In the case of READ access, it tries to retrieve the head of the object, It will fail if there isn't read access. In the case of writting, a warning is printed. It is the same as AWS NIO is doing to check the file access.The
setEndpoint
andsetRegion
methods in the S3Client wrapper are removed as it is not available in the v2 clients. They were only used in tests.CannedAccessControlList
is split in two classes one for objects and another for buckets. In most of the code it has been substituted by ObjectCannedACL.ContentType
andContentLength
are part of the request instead of theObjectMetadata
, and they can be obtained invoking the S3client.headObject method in the SDK v2S3ClientConfiguration
doesn't exist in SDK v2. Two new classes have been created to emulate the same behaviour. They convert the properties to the SDK v2 sync and async client configurations.SsoCredentialsProviderV1
class is not needed anymore as SDK v2 already manages the SSO credentials. The custom provider chain created in theS3FileSystemProvider.getCredetialsProvider0
to include theSsoCredentialsProviderV1
ihas been replace by theDefaultCredentialProvider
in v2.Credentials and config are automatically merged by SDK v2. No option for NXF_DISABLE_AWS_CONFIG_MERGE.
In V2, clients and requests are immutable and must be generated with a builder class. Some helper methods have been modified to pass builder classes instead of requests, such as
makeJobDefRequest
,configJobRefRequest
,addVolumeMountsToContainer
, etc.S3 Parallel Download was deprecated and S3CopyStream was not used. They have been removed.
In v1, the upload directory was performed by walking through the different directory files and uploading them one by one. In v2, it has been substituted by the uploadDirectory method in the SDK.