Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix context timeout associated with imds apis #1732

Merged
merged 1 commit into from
Mar 22, 2024

Conversation

ndbaker1
Copy link
Member

@ndbaker1 ndbaker1 commented Mar 19, 2024

Issue #, if available:

Description of changes:

the ec2imds apis fail fast in the GetUserData call, because of the context driven error:

operation error ec2imds: GetUserData, canceled, context deadline exceeded

notice that we only pass context.TODO() into this call, but the underlying middleware for the aws sdk call to imds adds its own deadline if one is missing or trivially set as 0.
https://github.com/aws/aws-sdk-go-v2/blob/main/feature/ec2/imds/request_middleware.go#L268-L280

the timeout override that gets used is 5 seconds

in this PR we are setting a context timeout greater than the retry period.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Testing Done

add e2e tests to help with this validation.

See this guide for recommended testing for PRs. Some tests may not apply. Completing tests and providing additional validation steps are not required, but it is recommended and may reduce review time and time to merge.

@cartermckinnon
Copy link
Member

cartermckinnon commented Mar 20, 2024

This really sounds like a bug in the SDK to me, we're not the first ones to run into it: aws/aws-sdk-go-v2#1247

IIUC the SDK is overriding the context we pass and as a result, making our retry options useless?

Comment on lines -105 to +107
imds-mock --config-file $CONFIG_PATH &
[ "${IMDS_MOCK_ONLY_CONFIGURE:-}" = "true" ] || imds-mock --config-file $CONFIG_PATH &
export AWS_EC2_METADATA_SERVICE_ENDPOINT=http://localhost:1338
$HOME/.local/bin/moto_server -p5000 &
[ "${AWS_MOCK_ONLY_CONFIGURE:-}" = "true" ] || $HOME/.local/bin/moto_server -p5000 &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just split this into 2 helpers, mock::imds and mock::aws? I combined them so that our standard set of mocks didn't get unruly, but it's not that big of a deal

Copy link
Member Author

@ndbaker1 ndbaker1 Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was considering it, and I would probably still use an ONLY_CONFIGURE env so that i can enable the service part way through the run if that's fine.
I could just export the variable myself but i figured its cleaner using the mock as the mechanism

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's not much pressure since this is a pretty unique test so lets not worry about it for now :)

@ndbaker1
Copy link
Member Author

ndbaker1 commented Mar 20, 2024

This really sounds like a bug in the SDK to me, we're not the first ones to run into it: aws/aws-sdk-go-v2#1247
IIUC the SDK is overriding the context we pass and as a result, making our retry options useless?

right, the underlying middleware for this client pretty much throws a context timeout onto your request, xref.
Not sure what they were thinking since its explicitly added here, but it's not something i would expect the sdk to do

@ndbaker1
Copy link
Member Author

updated the sdk to the versions with the fix, then also just moved the userdata helper into the imds wrapper

DisableDefaultTimeout: true,
Retryer: retry.NewStandard(func(so *retry.StandardOptions) {
so.MaxAttempts = 15
so.MaxBackoff = 1 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means 1 second between each attempt, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, the underlying exponential retryer just behaves linearly for 1 second backoff

@cartermckinnon
Copy link
Member

/ci

Copy link
Contributor

@cartermckinnon roger that! I've dispatched a workflow. 👍

Copy link
Contributor

@cartermckinnon the workflow that you requested has completed. 🎉

AMI variantBuildTest
1.21 / al2success ✅failure ❌
1.22 / al2success ✅success ✅
1.23 / al2success ✅success ✅
1.23 / al2023success ✅success ✅
1.24 / al2success ✅success ✅
1.24 / al2023success ✅success ✅
1.25 / al2success ✅success ✅
1.25 / al2023success ✅success ✅
1.26 / al2success ✅failure ❌
1.26 / al2023success ✅success ✅
1.27 / al2success ✅success ✅
1.27 / al2023success ✅success ✅
1.28 / al2success ✅success ✅
1.28 / al2023success ✅success ✅
1.29 / al2success ✅success ✅
1.29 / al2023success ✅success ✅

Copy link
Member

@cartermckinnon cartermckinnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Issacwww Issacwww merged commit c674960 into awslabs:main Mar 22, 2024
10 checks passed
@ndbaker1 ndbaker1 deleted the imds-fix branch March 22, 2024 23:33
atmosx pushed a commit to gathertown/amazon-eks-ami that referenced this pull request Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants