-
Notifications
You must be signed in to change notification settings - Fork 316
Reduce build image components templates size #6910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
cc68643
to
863bf90
Compare
1769061
to
7209724
Compare
) | ||
) | ||
assert_that( | ||
len(str(imagebuilder_resources.get(component_name).get("Properties").get("Data"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the length of the component will always remain the same.
Because this component has not replaced the parameters that we pass and which are replaced by Cfn Function Sub.So we need to test that by replacing these paramters in the Data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just keep it as is. Since the test limits the length to under 13600, it leave 2400 for all parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
@@ -13,205 +13,91 @@ constants: | |||
phases: | |||
- name: build | |||
steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you describe the changes made in parallelcluster.yaml
its difficult to follow through and compare without guidance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more comments to describe the changes and hope they will help
# Get Cookbook Url | ||
- name: CookbookUrl | ||
# Initialize system information and URLs | ||
- name: SystemInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really smart!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! AI did it
# Get Cookbook name | ||
- name: PClusterCookbookVersionName | ||
# Get operating system name for variable chaining | ||
- name: OperatingSystemName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not combine OperatingSystemName and SystemInfo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OperatingSystemName
is used also in cli/src/pcluster/config/imagebuilder_config.py as part of the parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we set this OS in the image_dna.json when Cookbook could essentially set that for us?
These defaults are used for image_dna.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh thats to validate OS in cookbook!
echo ${!PLATFORM} | ||
|
||
# Get input base AMI Architecture | ||
- name: OperatingSystemArchitecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not remove this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK. This commit aims at retaining the logic. So I didn't dig into it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we dont retain this though or did i miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Get AWS Region | ||
- name: AWSRegion | ||
action: ExecuteBash | ||
inputs: | ||
commands: | ||
- | | ||
set -v | ||
echo ${AWS::Region} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer needed after this change https://github.com/aws/aws-parallelcluster/pull/6910/files#diff-64aaeae41ccc76ceed59ed96a79d7754dbb96617d7a7768b6a7cb1c8ad7e2cc0L343-R343
RELEASE='{{ build.OperatingSystemRelease.outputs.stdout }}' | ||
|
||
if [ `echo "${!RELEASE}" | grep -w '^amzn\.2'` ]; then | ||
OS='alinux2' | ||
elif [ `echo "${!RELEASE}" | grep -w '^amzn\.2023'` ]; then | ||
OS='alinux2023' | ||
elif [ `echo "${!RELEASE}" | grep '^ubuntu\.22'` ]; then | ||
OS='ubuntu2204' | ||
elif [ `echo "${!RELEASE}" | grep '^ubuntu\.24'` ]; then | ||
OS='ubuntu2404' | ||
elif [ `echo "${!RELEASE}" | grep '^rhel\.8'` ]; then | ||
OS='rhel8' | ||
elif [ `echo "${!RELEASE}" | grep '^rocky\.8'` ]; then | ||
OS='rocky8' | ||
elif [ `echo "${!RELEASE}" | grep '^rhel\.9'` ]; then | ||
OS='rhel9' | ||
elif [ `echo "${!RELEASE}" | grep '^rocky\.9'` ]; then | ||
OS='rocky9' | ||
else | ||
echo "Operating System '${!RELEASE}' is not supported. Failing build." | ||
exit {{ FailExitCode }} | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
. /etc/os-release | ||
RELEASE="${!ID}${!VERSION_ID:+.${!VERSION_ID}}" | ||
case "$RELEASE" in | ||
amzn.2) echo 'alinux2' ;; | ||
amzn.2023) echo 'alinux2023' ;; | ||
ubuntu.22*) echo 'ubuntu2204' ;; | ||
ubuntu.24*) echo 'ubuntu2404' ;; | ||
rhel.8*) echo 'rhel8' ;; | ||
rocky.8*) echo 'rocky8' ;; | ||
rhel.9*) echo 'rhel9' ;; | ||
rocky.9*) echo 'rocky9' ;; | ||
*) echo "Unsupported OS: $RELEASE" && exit {{ FailExitCode }} ;; | ||
esac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echo ${!PLATFORM} | ||
|
||
# Get input base AMI Architecture | ||
- name: OperatingSystemArchitecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK. This commit aims at retaining the logic. So I didn't dig into it
- name: IsOperatingSystemSupported | ||
action: ExecuteBash | ||
inputs: | ||
commands: | ||
- | | ||
set -v | ||
if [ ${CfnParamUpdateOsAndReboot} == false ]; then | ||
RELEASE='{{ build.OperatingSystemRelease.outputs.stdout }}' | ||
if [ `echo "${!RELEASE}" | grep -Ev '^(amzn|ubuntu|rhel|rocky)'` ]; then | ||
echo "This component does not support '${!RELEASE}'. Failing build." | ||
exit {{ FailExitCode }} | ||
fi | ||
|
||
# This component only supports aarch64 CPUs on Amazon Linux 2, Ubuntu2204, RHEL8, Rocky8, RHEL9 and Rocky9 | ||
ARCH=$(uname -m) | ||
if [[ `echo ${!ARCH}` == 'aarch64' ]]; then | ||
if [ `echo "${!RELEASE}" | grep -Ev '^(amzn\.2|amzn\.2023|ubuntu\.20\.04|ubuntu\.22\.04|ubuntu\.24\.04|rhel\.8|rocky\.8|rhel\.9|rocky\.9)'` ]; then | ||
echo "This component does not support '${!RELEASE}' on ARM64 CPUs. Failing build." | ||
exit {{ FailExitCode }} | ||
fi | ||
fi | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed because the check is redundant with https://github.com/aws/aws-parallelcluster/pull/6910/files#diff-365de0075f4659870e5ed9878016b28a29065c60bc7bb763da54670e5b626eb6R23-R35
# This component only supports aarch64 CPUs on Amazon Linux 2, Ubuntu2204, RHEL8, Rocky8, RHEL9 and Rocky9 | ||
ARCH=$(uname -m) | ||
if [[ `echo ${!ARCH}` == 'aarch64' ]]; then | ||
if [ `echo "${!RELEASE}" | grep -Ev '^(amzn\.2|amzn\.2023|ubuntu\.20\.04|ubuntu\.22\.04|ubuntu\.24\.04|rhel\.8|rocky\.8|rhel\.9|rocky\.9)'` ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove these checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why were these checks added when previous steps already checked the OS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was it centos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- name: OperatingSystemRelease | ||
action: ExecuteBash | ||
inputs: | ||
commands: | ||
- | | ||
set -v | ||
FILE=/etc/os-release | ||
if [ -e ${!FILE} ]; then | ||
. ${!FILE} | ||
echo "${!ID}${!VERSION_ID:+.${!VERSION_ID}}" | ||
else | ||
echo "The file '${!FILE}' does not exist. Failing build." | ||
exit {{ FailExitCode }} | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- name: OperatingSystemVersion | ||
action: ExecuteBash | ||
inputs: | ||
commands: | ||
- | | ||
set -v | ||
FILE=/etc/os-release | ||
if [ -e ${!FILE} ]; then | ||
. ${!FILE} | ||
echo "${!VERSION_ID}" | ||
else | ||
echo "The file '${!FILE}' does not exist. Failing build." | ||
exit {{ FailExitCode }} | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- name: PlatformName | ||
action: ExecuteBash | ||
inputs: | ||
commands: | ||
- | | ||
set -v | ||
OS='{{ build.OperatingSystemName.outputs.stdout }}' | ||
|
||
if [ `echo "${!OS}" | grep -E '^(alinux|rhel|rocky)'` ]; then | ||
PLATFORM='RHEL' | ||
elif [ `echo "${!OS}" | grep -E '^ubuntu'` ]; then | ||
PLATFORM='DEBIAN' | ||
fi | ||
|
||
echo ${!PLATFORM} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach of sourcing system info from a single file rather than retrieving every time from step outputs. |
apt-get clean | ||
apt-get -y update | ||
apt-get -y install build-essential curl wget jq | ||
fi | ||
exit 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this exit 0 needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not strictly needed. It is the logic is complex. We had a failure from installing curl
on al2023. Adding exit 0
makes sure the step pass even if some package installation fail. I know this will cover up some issue. But I don't see serious problem when commands here fail to run
curl --retry 3 -L $CINC_URL | bash -s -- -v {{ ChefVersion }} | ||
[[ -e $CA_CERTS_FILE ]] && { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are changing the indirect expansion to directly check for the file. IIRC with indirect expansion we check for the existence of the symlink file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand this comment. Can you elaborate?
cd /etc/chef && tar -xzf /etc/chef/aws-parallelcluster-cookbook.tgz --strip-components 1 && rm -f aws-parallelcluster-cookbook.tgz | ||
|
||
- name: CreatingChefClientFile | ||
- name: CreateFiles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NitPick: Can we name it CreateChefFiles, I dont want the name to be same as action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
PLATFORM='{{ build.PlatformName.outputs.stdout }}' | ||
. /opt/parallelcluster/system_info | ||
|
||
# Disable Nouveau driver | ||
/bin/sed -r -i -e 's/GRUB_CMDLINE_LINUX="(.*)"/GRUB_CMDLINE_LINUX="\1 rd.driver.blacklist=nouveau nouveau.modeset=0"/' /etc/default/grub | ||
if [[ ${!PLATFORM} == RHEL ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not need the indirect expansion here as we are no longer keeping PLATFORM='{{ build.PlatformName.outputs.stdout }}'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we still use indirect expansion in multiple places, please correct the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Bash this syntax ${!PLATFORM}
leads to indirect expansion where the value of PLATFORM for example if RHEL, then RHEL will be treated as a variable and any value it holds will be used here.
Below is what it will do
PLATFORM='RHEL'
RHEL='OTHER VALUE'
echo "$PLATFORM" # Prints RHEL
echo "${!PLATFORM}" # Prints OTHER VALUE
The syntax ${PLATFORM}
or ``$PLATFORM` is same and is direct expansion where it only value of PLATFORM for example if RHEL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz address the use of indirect expansion in these bash scripts as we not longer use {{ build.AWSRegion.outputs.stdout }}
for PLATFORM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought {{ build.AWSRegion.outputs.stdout }} was replaced by jinja but now i am not sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior in bash is different from CloudFormation
In CloudFormation:
To write a dollar sign and curly braces (${}) literally, add an exclamation point (!) after the open curly brace, such as ${!Literal}. CloudFormation resolves this text as ${Literal}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I would suggest that you make these changes in all the parts of the YAML as I see in some places we are see if [[ ${!PLATFORM} == RHEL ]]; then
or if [[ $PLATFORM == RHEL ]]; then
overwrite: true | ||
|
||
- name: RemoveKernelPin | ||
- name: Cleanup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NitPick: Can you keep a name like RemoveKernelPinAndCleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ImageBuilder has a limit of 16000. This commit reduces the size and add a unit test to prevent the template hitting the limit when parameters with long values are provided Co-authored-by: Himani Anil Deshpande <himanidp@amazon.com>
7209724
to
19d7235
Compare
Description of changes
ImageBuilder has a limit of 16000 with create component API https://docs.aws.amazon.com/imagebuilder/latest/APIReference/API_CreateComponent.html. This commit reduces the size and add a unit test to prevent the template hitting the limit when parameters with long values are provided
With this PR
Component UpdateOSComponent has size 9251
Component ParallelClusterComponent has size 9699
Component ParallelClusterTagComponent has size 8073
Component ParallelClusterTestComponent has size 13547
Tests
Checklist
develop
add the branch name as prefix in the PR title (e.g.[release-3.6]
).Please review the guidelines for contributing and Pull Request Instructions.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.