Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 multipart upload with Length::UpTo appears to be broken #821

Closed
jdisanti opened this issue Jun 6, 2023 · 11 comments
Closed

S3 multipart upload with Length::UpTo appears to be broken #821

jdisanti opened this issue Jun 6, 2023 · 11 comments
Assignees
Labels
bug This issue is a bug. p2 This is a standard priority issue pending-release This issue will be fixed by an approved PR that hasn't been released yet.

Comments

@jdisanti
Copy link
Contributor

jdisanti commented Jun 6, 2023

Describe the bug

When doing a multipart upload to S3, I should be able to use ByteStream's Length::UpTo, but that fails on the last chunk.

Expected Behavior

It should upload the last part of the file even though it's not a full sized part.

Current Behavior

In my test, I was uploading a file with 12345 bytes in parts of 1000 bytes, so the last part should only be 345 bytes long. The 655 in the error below is 1000 - 345:

dispatch failure: user error: user body write aborted: early end, expected 655 more bytes: early end, expected 655 more bytes (DispatchFailure(DispatchFailure { source: ConnectorError { kind: User, source: hyper::Error(User(BodyWriteAborted), NotEof(655)), connection: Unknown } }))

Note: The minimum part size is 5 MiB, so my example part sizes above won't succeed in the CompleteMultipartUpload call, but the individual UploadPart calls should succeed.

Reproduction Steps

#[derive(Debug, clap::Parser)]
#[command()]
struct Args {
    #[arg(long)]
    size_bytes: u64,
    #[arg(long)]
    part_size_bytes: u64,
    #[arg(long)]
    bucket: String,
}

async fn put_object_multipart(
    client: &s3::Client,
    args: &Args,
    path: &Path,
) -> Result<(), BoxError> {
    let upload_id = client
        .create_multipart_upload()
        .bucket(&args.bucket)
        .key(TEST_KEY)
        .send()
        .await?
        .upload_id
        .expect("missing upload id");

    let mut parts = Vec::new();
    let part_count = args.size_bytes / args.part_size_bytes + 1;
    for part in 0..part_count {
        let offset = args.part_size_bytes * part;
        let stream = ByteStream::read_from()
            .path(path)
            .offset(offset)
            .length(Length::UpTo(args.part_size_bytes))
            .build()
            .await?;
        let part_output = client
            .upload_part()
            .key(TEST_KEY)
            .bucket(&args.bucket)
            .upload_id(&upload_id)
            .body(stream)
            .part_number(part as i32 + 1) // S3 takes a 1-based index
            .send()
            .await?;
        parts.push(
            CompletedPart::builder()
                .part_number(part as i32 + 1)
                .e_tag(part_output.e_tag.expect("must have an e-tag"))
                .build(),
        );
    }
    client
        .complete_multipart_upload()
        .bucket(&args.bucket)
        .key(BENCH_KEY)
        .upload_id(&upload_id)
        .multipart_upload(
            CompletedMultipartUpload::builder()
                .set_parts(Some(parts))
                .build(),
        )
        .send()
        .await?;

    Ok(())
}

Possible Solution

No response

Additional Information/Context

No response

Version

├── aws-config v0.55.3
│   ├── aws-credential-types v0.55.3
│   │   ├── aws-smithy-async v0.55.3
│   │   ├── aws-smithy-types v0.55.3
│   ├── aws-http v0.55.3
│   │   ├── aws-credential-types v0.55.3 (*)
│   │   ├── aws-smithy-http v0.55.3
│   │   │   ├── aws-smithy-eventstream v0.55.3
│   │   │   │   ├── aws-smithy-types v0.55.3 (*)
│   │   │   ├── aws-smithy-types v0.55.3 (*)
│   │   ├── aws-smithy-types v0.55.3 (*)
│   │   ├── aws-types v0.55.3
│   │   │   ├── aws-credential-types v0.55.3 (*)
│   │   │   ├── aws-smithy-async v0.55.3 (*)
│   │   │   ├── aws-smithy-client v0.55.3
│   │   │   │   ├── aws-smithy-async v0.55.3 (*)
│   │   │   │   ├── aws-smithy-http v0.55.3 (*)
│   │   │   │   ├── aws-smithy-http-tower v0.55.3
│   │   │   │   │   ├── aws-smithy-http v0.55.3 (*)
│   │   │   │   │   ├── aws-smithy-types v0.55.3 (*)
│   │   │   │   ├── aws-smithy-types v0.55.3 (*)
│   │   │   ├── aws-smithy-http v0.55.3 (*)
│   │   │   ├── aws-smithy-types v0.55.3 (*)
│   ├── aws-sdk-sso v0.28.0
│   │   ├── aws-credential-types v0.55.3 (*)
│   │   ├── aws-endpoint v0.55.3
│   │   │   ├── aws-smithy-http v0.55.3 (*)
│   │   │   ├── aws-smithy-types v0.55.3 (*)
│   │   │   ├── aws-types v0.55.3 (*)
│   │   ├── aws-http v0.55.3 (*)
│   │   ├── aws-sig-auth v0.55.3
│   │   │   ├── aws-credential-types v0.55.3 (*)
│   │   │   ├── aws-sigv4 v0.55.3
│   │   │   │   ├── aws-smithy-eventstream v0.55.3 (*)
│   │   │   │   ├── aws-smithy-http v0.55.3 (*)
│   │   │   ├── aws-smithy-eventstream v0.55.3 (*)
│   │   │   ├── aws-smithy-http v0.55.3 (*)
│   │   │   ├── aws-types v0.55.3 (*)
│   │   ├── aws-smithy-async v0.55.3 (*)
│   │   ├── aws-smithy-client v0.55.3 (*)
│   │   ├── aws-smithy-http v0.55.3 (*)
│   │   ├── aws-smithy-http-tower v0.55.3 (*)
│   │   ├── aws-smithy-json v0.55.3
│   │   │   └── aws-smithy-types v0.55.3 (*)
│   │   ├── aws-smithy-types v0.55.3 (*)
│   │   ├── aws-types v0.55.3 (*)
│   ├── aws-sdk-sts v0.28.0
│   │   ├── aws-credential-types v0.55.3 (*)
│   │   ├── aws-endpoint v0.55.3 (*)
│   │   ├── aws-http v0.55.3 (*)
│   │   ├── aws-sig-auth v0.55.3 (*)
│   │   ├── aws-smithy-async v0.55.3 (*)
│   │   ├── aws-smithy-client v0.55.3 (*)
│   │   ├── aws-smithy-http v0.55.3 (*)
│   │   ├── aws-smithy-http-tower v0.55.3 (*)
│   │   ├── aws-smithy-json v0.55.3 (*)
│   │   ├── aws-smithy-query v0.55.3
│   │   │   ├── aws-smithy-types v0.55.3 (*)
│   │   ├── aws-smithy-types v0.55.3 (*)
│   │   ├── aws-smithy-xml v0.55.3
│   │   ├── aws-types v0.55.3 (*)
│   ├── aws-smithy-async v0.55.3 (*)
│   ├── aws-smithy-client v0.55.3 (*)
│   ├── aws-smithy-http v0.55.3 (*)
│   ├── aws-smithy-http-tower v0.55.3 (*)
│   ├── aws-smithy-json v0.55.3 (*)
│   ├── aws-smithy-types v0.55.3 (*)
│   ├── aws-types v0.55.3 (*)
├── aws-sdk-s3 v0.28.0
│   ├── aws-credential-types v0.55.3 (*)
│   ├── aws-endpoint v0.55.3 (*)
│   ├── aws-http v0.55.3 (*)
│   ├── aws-sig-auth v0.55.3 (*)
│   ├── aws-sigv4 v0.55.3 (*)
│   ├── aws-smithy-async v0.55.3 (*)
│   ├── aws-smithy-checksums v0.55.3
│   │   ├── aws-smithy-http v0.55.3 (*)
│   │   ├── aws-smithy-types v0.55.3 (*)
│   ├── aws-smithy-client v0.55.3 (*)
│   ├── aws-smithy-eventstream v0.55.3 (*)
│   ├── aws-smithy-http v0.55.3 (*)
│   ├── aws-smithy-http-tower v0.55.3 (*)
│   ├── aws-smithy-json v0.55.3 (*)
│   ├── aws-smithy-types v0.55.3 (*)
│   ├── aws-smithy-xml v0.55.3 (*)
│   ├── aws-types v0.55.3 (*)
├── aws-smithy-http v0.55.3 (*)

Environment details (OS name and version, etc.)

MacOS Ventura

Logs

No response

@jdisanti jdisanti added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 6, 2023
@Velfi Velfi removed the needs-triage This issue or PR still needs to be triaged. label Jun 6, 2023
@rcoh rcoh added the p2 This is a standard priority issue label Aug 8, 2023
@RAJAGOPALAN-GANGADHARAN

Hey @jdisanti I'm new to rust but looking to contribute to a project and increase my rust knowledge. This seems like a good issue to start, do you think I can work on this? Thanks!

@Velfi
Copy link
Contributor

Velfi commented Oct 16, 2023

@RAJAGOPALAN-GANGADHARAN, you can have a go at this but we may be slow to review your changes b/c we're quite busy.

A good place to start would be by writing a failing integration test and then updating ByteStream as necessary. We'd very much prefer a solution that doesn't require breaking changes. Are you familiar with how the project's codegen and testing work? If not, check out the README; It's a complex project and not so easy to work with. Also, don't forget to run pre-commit install to set up the pre-commit checks. Otherwise, CI will reject your PRs for formatting and linter errors.

@RAJAGOPALAN-GANGADHARAN
Copy link

Hi @Velfi thanks for responding back. SO I have setup the project, and made a sample unit test to reproduce. As part of this I want to setup a faster debug-run-loop, Currently I do:

./gradlew -Paws.services=+sts,+sso,+ssooidc,+s3 :aws:sdk:assemble
# To run test
cd ~/OpenSource/smithy-rs/aws/sdk/build/aws-sdk
cargo test --all-features

This seems to rebuild everything, when I make a change to the test in s3 I do assemble. My assumption is that while building s3 it also have to rebuild sts etc. which triggers the rebuild of every other components while running tests. I would like to run tests for s3 specifically. how can I achieve that? Thanks!

This seems to

@Velfi
Copy link
Contributor

Velfi commented Nov 6, 2023

You may not need to be including ssooidc in your services list but otherwise your command looks good. If you're only wanting to test s3, change your cargo command to this:

cargo test --all-features --package aws-sdk-s3

@rcoh
Copy link
Contributor

rcoh commented Nov 6, 2023

You need ssoidc to satisfy the versioner. My usual loop, fwiw is this:

./gradlew :aws:sdk:assemble -P aws.services=+sts,+sso,+s3,+ssooidc && (cd ../sdk-snapshot/sdk/s3 && cargo check --all-features --all-targets)

I have ../sdk-snapshot as a symlink to the build directory. (Of course, change cargo check to what you want)

@RAJAGOPALAN-GANGADHARAN
Copy link

cargo test --all-features --package aws-sdk-s3

Ah I was trying them both individually but not together as the names were slightly confusing, But yeah let me try this.

Thanks Velfi and rcoh, will try them out.

@RAJAGOPALAN-GANGADHARAN
Copy link

RAJAGOPALAN-GANGADHARAN commented Nov 25, 2023

raj@LAPTOP-HBU4JTTC:~/OpenSource/smithy-rs/aws/sdk/build/aws-sdk$ cargo test --all-features --package aws-sdk-s3
   Compiling aws-smithy-async v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-smithy-async)
   Compiling aws-types v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-types)
   Compiling aws-smithy-types v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-smithy-types)
   Compiling aws-smithy-xml v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-smithy-xml)
   Compiling aws-smithy-runtime-api v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-smithy-runtime-api)
   Compiling aws-smithy-eventstream v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-smithy-eventstream)
   Compiling aws-smithy-json v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-smithy-json)
   Compiling aws-smithy-query v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-smithy-query)
   Compiling aws-smithy-http v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-smithy-http)
   Compiling aws-credential-types v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-credential-types)
   Compiling aws-smithy-protocol-test v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-smithy-protocol-test)
   Compiling aws-sigv4 v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-sigv4)
   Compiling aws-smithy-checksums v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-smithy-checksums)
   Compiling aws-http v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-http)
   Compiling aws-smithy-runtime v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-smithy-runtime)
   Compiling aws-runtime v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-runtime)
   Compiling aws-sdk-sso v0.0.0-local (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/sso)
   Compiling aws-sdk-ssooidc v0.0.0-local (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/ssooidc)
   Compiling aws-sdk-sts v0.0.0-local (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/sts)
   Compiling aws-sdk-s3 v0.0.0-local (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/s3)
   Compiling aws-config v0.56.1 (/home/raj/OpenSource/smithy-rs/aws/sdk/build/aws-sdk/sdk/aws-config)
error[E0425]: cannot find function `join_all` in module `future`
   --> sdk/s3/tests/concurrency.rs:221:31
    |
221 |     let res: Vec<_> = future::join_all(futures).await;
    |                               ^^^^^^^^ not found in `future`

For more information about this error, try `rustc --explain E0425`.
error: could not compile `aws-sdk-s3` (test "concurrency") due to previous error
warning: build failed, waiting for other jobs to finish...

When I do

cargo test --all-features --package aws-sdk-s3

I think some test specific dependency is missing specifically for s3, which is getting resolved when run at workspace level.

However I realized for getting the tests to run, I could just modify the tests directly in the build directory and let incremental build take care of this Combined this with running specific tests my reproduction setup is faster. There by it now recompiles only my changed file, this might not work when we want to regenerate the sdk (maybe some sort of selected copying to build work here).

@rcoh
Copy link
Contributor

rcoh commented Nov 25, 2023 via email

@pablosichert
Copy link

I tried to reproduce the initial bug report with the following snippet that I stitched together from various AWS examples:

use aws_sdk_s3::config::Credentials;
use aws_sdk_s3::operation::create_multipart_upload::CreateMultipartUploadOutput;
use aws_sdk_s3::types::CompletedPart;
use aws_sdk_s3::Config;
use aws_sdk_s3::{config::Region, Client as S3Client};
use aws_smithy_types::byte_stream::{ByteStream, Length};
use rand::distributions::Alphanumeric;
use rand::{thread_rng, Rng};
use std::fs::File;
use std::io::prelude::*;
use std::path::Path;

const CHUNK_SIZE: u64 = 1024 * 1024 * 5;
const MAX_CHUNKS: u64 = 10000;

#[tokio::main]
pub async fn main() {
    let access_key_id = "..."; // E.g. "XXXXXXXXXXXXXXXXXXXX"
    let secret_access_key = "..."; // E.g. "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
    let region = "..."; // E.g. "us-east-1"
    let bucket_name = "..."; // E.g. "my-bucket"
    let endpoint_url = "..."; // E.g. &format!("https://{bucket_name}.s3.{region}.amazonaws.com");
    let key = "..."; // E.g. "my-key"

    let client = S3Client::from_conf(
        Config::builder()
            .behavior_version_latest()
            .endpoint_url(endpoint_url)
            .region(Region::from_static(region))
            .force_path_style(true)
            .credentials_provider(Credentials::new(
                access_key_id,
                secret_access_key,
                None,
                None,
                "",
            ))
            .build(),
    );

    let multipart_upload_res: CreateMultipartUploadOutput = client
        .create_multipart_upload()
        .bucket(bucket_name)
        .key(key)
        .send()
        .await
        .unwrap();

    let upload_id = multipart_upload_res.upload_id().unwrap();

    println!("upload_id: {upload_id}");

    let mut file = File::create(key).expect("Could not create sample file.");
    while file.metadata().unwrap().len() <= CHUNK_SIZE * 4 {
        let rand_string: String = thread_rng()
            .sample_iter(&Alphanumeric)
            .take(256)
            .map(char::from)
            .collect();
        let return_string: String = "\n".to_string();
        file.write_all(rand_string.as_ref())
            .expect("Error writing to file.");
        file.write_all(return_string.as_ref())
            .expect("Error writing to file.");
    }

    println!("File created");

    let path = Path::new(&key);
    let file_size = tokio::fs::metadata(path).await.unwrap().len();

    let chunk_count = (file_size / CHUNK_SIZE) + 1;

    if file_size == 0 {
        panic!("Bad file size.");
    }
    if chunk_count > MAX_CHUNKS {
        panic!("Too many chunks! Try increasing your chunk size.")
    }

    let mut upload_parts: Vec<CompletedPart> = Vec::new();

    for chunk_index in 0..chunk_count {
        let stream = ByteStream::read_from()
            .path(path)
            .offset(chunk_index * CHUNK_SIZE)
            .length(Length::UpTo(CHUNK_SIZE))
            .build()
            .await
            .unwrap();

        let part_number = (chunk_index as i32) + 1;
        let upload_part_res = client
            .upload_part()
            .key(key)
            .bucket(bucket_name)
            .upload_id(upload_id)
            .body(stream)
            .part_number(part_number)
            .send()
            .await
            .unwrap();

        let e_tag = upload_part_res.e_tag.unwrap_or_default();

        println!("part_number: {part_number}, e_tag: {e_tag}");

        upload_parts.push(
            CompletedPart::builder()
                .e_tag(e_tag)
                .part_number(part_number)
                .build(),
        );
    }
}

When running it, I get essentially the same error:

upload_id: ijo3ab3f7A8EyS4Y28tpgaXtALbM0KryJeRBWHGt9JEx4TSD__1IJReIS4rnTIJ3xk1FnWdptj9o8KSWI8nshfigBVQlhudnDvPKW4vOqs74IGr.A_TB56kbyn7agoxU
File created
part_number: 1, e_tag: "4ec1de1e3e9485e26d96bc534d5d7596"
part_number: 2, e_tag: "57ff3b87d394a222b09c97c872312e6b"
part_number: 3, e_tag: "ee911610ee7d3065849f0fa9ddc5960e"
part_number: 4, e_tag: "c661880584bd6bdec56a486d404e50e9"
thread 'main' panicked at src/example.rs:102:14:
called `Result::unwrap()` on an `Err` value: DispatchFailure(DispatchFailure { source: ConnectorError { kind: User, source: hyper::Error(User(BodyWriteAborted), NotEof(5242686)), connection: Unknown } })
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The Length::UpTo seems to set the Content-Length header in the HTTP request. However, the last request includes less bytes than advertised, so I'm not entirely surprised that hyper returns this error here.

This is fixed in the latest version

@rcoh could you point me to a release where this is fixed, or how to correct my example?

@rcoh
Copy link
Contributor

rcoh commented Feb 12, 2024

ah! my comment referred to only the join_all issue—the main "UpTo" problem is still open.

@ysaito1001 ysaito1001 self-assigned this Aug 20, 2024
github-merge-queue bot pushed a commit to smithy-lang/smithy-rs that referenced this issue Aug 21, 2024
…le file length (#3797)

## Motivation and Context
awslabs/aws-sdk-rust#821

## Description
This PR addresses an issue with the
[Length::UpTo](https://docs.rs/aws-smithy-types/1.2.2/aws_smithy_types/byte_stream/enum.Length.html)
usage in `FsBuilder`. Previously, if a value specified for `UpTo`
exceeded the remaining file length, `FsBuilder` would incorrectly accept
the value. This discrepancy led to failures in subsequent request
dispatches, as the actual body size did not match the advertised
`Content-Length`, as explained
[here](awslabs/aws-sdk-rust#821 (comment))
(thank you @pablosichert for a self-contained reproducer and problem
analysis!).

## Testing
- Added a unit test for `FsBuilder` verifying the `Length::UpTo` usage
- Ran successfully a customer provided
[reproducer](awslabs/aws-sdk-rust#821 (comment))
with the code changes in this PR (with an added a call to
`complete_multipart_upload` at the end, it also succeeded in uploading
the object):
```
upload_id: cTDSngbubD25cOoFCNgjpG55o0hAMQNjO16dNFyNTKjg9PEtkcrKG5rTGzBns7CXoO8T.Qm9GpNj6jgwJTKcXDpsca95wSMWMDfPF0DBhmbk3OAGHuuGM1E70spk2suW
File created
part_number: 1, e_tag: "5648ddf58c7c90a788d7f16717a61b08"
part_number: 2, e_tag: "a6bdad6d65d18d842ef1d57ca4673bc3"
part_number: 3, e_tag: "f518f6b19b255ec49b61d511288554fc"
part_number: 4, e_tag: "1496524801eb1d0a7cfbe608eb037d9c"
part_number: 5, e_tag: "21340de04927ce1bed58ad0375c03e01"
```

## Checklist
- [x] For changes to the smithy-rs codegen or runtime crates, I have
created a changelog entry Markdown file in the `.changelog` directory,
specifying "client," "server," or both in the `applies_to` key.
- [x] For changes to the AWS SDK, generated SDK code, or SDK runtime
crates, I have created a changelog entry Markdown file in the
`.changelog` directory, specifying "aws-sdk-rust" in the `applies_to`
key.

----

_By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice._
@ysaito1001 ysaito1001 added the pending-release This issue will be fixed by an approved PR that hasn't been released yet. label Aug 21, 2024
@jmklix jmklix closed this as completed Aug 30, 2024
Copy link

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue pending-release This issue will be fixed by an approved PR that hasn't been released yet.
Projects
None yet
Development

No branches or pull requests

7 participants