Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

harden pull-sandbox-image script #1649

Merged
merged 2 commits into from
Feb 10, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 17 additions & 8 deletions files/pull-sandbox-image.sh
Original file line number Diff line number Diff line change
@@ -1,16 +1,25 @@
#!/usr/bin/env bash
set -euo pipefail

source <(grep "sandbox_image" /etc/containerd/config.toml | tr -d ' ')

### skip if we don't have a sandbox_image set in config.toml
if [[ -z ${sandbox_image:-} ]]; then
echo >&2 "Skipping ... missing sandbox_image from /etc/containerd/config.toml"
exit 0
fi

### Short-circuit fetching sandbox image if its already present
if [[ "$(sudo ctr --namespace k8s.io image ls | grep $sandbox_image)" != "" ]]; then
if [[ -n $(sudo ctr --namespace k8s.io image ls | grep "${sandbox_image}") ]]; then
echo >&2 "Skipping ... sandbox_image '${sandbox_image}' is already present"
exit 0
fi

# use the region that the sandbox image comes from for the ecr authentication,
# also mitigating the localzone isse: https://github.com/aws/aws-cli/issues/7043
region=$(echo "${sandbox_image}" | cut -f4 -d ".")
# if the sandbox image is provided by the bootstrap script, then the region is
# guaranteed to come from this data source.
# see: https://github.com/awslabs/amazon-eks-ami/blob/baef6f0860f60dbec366de30853e47418e3fb430/files/bootstrap.sh#L320-L338
# if the image is customer provided, then this is just a sane default for the
# region when attempting to get ecr credentials.
region=$(imds 'latest/dynamic/instance-identity/document' | jq .region -r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one problem here is ... if the sandbox_image is specified from another region ( different from where the node is running in :( )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we check if the echo/cut version (from the sandbox_image) is present in aws account list-regions? and skip if not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh but this would require new permissions on the role 😖

Copy link
Member Author

@ndbaker1 ndbaker1 Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there docs stating required naming conventions of ecr registries? I'm not 100% sure what i can/can't check for

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the sandbox_image is specified from another region

but this is only for user provided images right (based on the assumptions for the eks generated uri), which I'm not sure had a solution even prior to this 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we can let it slide and leave it to the imds variation. (cross fingers!)

Copy link
Member

@cartermckinnon cartermckinnon Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to touch this in this PR. Cutting the region out of the image reference has its caveats, for sure; but that logic has been in prod for over a year and it's proven solid. I think we should consider this change in a separate PR, since this one is mostly for mitigation

Copy link
Member Author

@ndbaker1 ndbaker1 Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that logic has been in prod for over a year and it's proven solid.

This logic has existed yes (https://github.com/awslabs/amazon-eks-ami/blame/master/files/pull-image.sh#L4), but its only been used in context where the image source was already known and decided by us right?

The only prior usage i know of is in

if /etc/eks/containerd/pull-image.sh "${img}"; then

Copy link
Member

@cartermckinnon cartermckinnon Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull-sandbox-image.sh used to call pull-image.sh instead of using crictl: 7fa037a#diff-57a6aadbbb1d3df65f4675ae80c562f7e406bcb11e41f6afb974043a2ede0aa0L11

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 that escaped my mind (and my github searches)
yea i think pulling it out makes sense then since it won't break the expectation beyond whats already changed in last commit


MAX_RETRIES=3

Expand All @@ -29,9 +38,9 @@ function retry() {
done
}

ecr_password=$(retry aws ecr get-login-password --region $region)
# for public, non-ecr repositories even if this fails to get ECR credentials the image will pull
ecr_password=$(retry aws ecr get-login-password --region "${region}")
if [[ -z ${ecr_password} ]]; then
echo >&2 "Unable to retrieve the ECR password."
exit 1
echo >&2 "Unable to retrieve the ECR password. Image pull may not be properly authenticated."
fi
retry sudo crictl pull --creds "AWS:${ecr_password}" "${sandbox_image}"
Loading