AWS ParallelCluster v2.11.0
·
78 commits
to release-2.11
since this release
We're excited to announce the release of AWS ParallelCluster Cookbook 2.11.0
This is associated with AWS ParallelCluster v2.11.0
ENHANCEMENTS
- Add support for Ubuntu 20.04.
- Add support for using FSx Lustre in subnet with no internet access.
- Add support for building custom Centos 7 AMIs on ARM.
- Make sure slurmd service is only enabled after post-install process, which will prevent user from unintentionally making compute node available during post-install process.
- Change to ssh_target_checker.sh syntax that makes the script compatible with pdsh.
- Add possibility to use a post installation script when building Centos 8 AMI.
- Install SSM agent on CentOS 7 and 8.
- Transition from IMDSv1 to IMDSv2.
- Add support for
security_group_id
in packer custom builders. Customers can exportAWS_SECURITY_GROUP_ID
environment variable to specify security group for custom builders when building custom AMIs. - Configure the following default gc_thresh values for performance at scale.
- net.ipv4.neigh.default.gc_thresh1 = 0
- net.ipv4.neigh.default.gc_thresh2 = 15360
- net.ipv4.neigh.default.gc_thresh3 = 16384
CHANGES
- Ubuntu 16.04 is no longer supported.
- Amazon Linux is no longer supported.
- Upgrade EFA installer to version 1.12.2
- EFA configuration:
efa-config-1.8-1
(fromefa-config-1.7
) - EFA profile:
efa-profile-1.5-1
(fromefa-profile-1.4
) - EFA kernel module:
efa-1.12.3
(fromefa-1.10.2
) - RDMA core:
rdma-core-32.1amzn
(fromrdma-core-31.2amzn
) - Libfabric:
libfabric-1.11.2amzon1.1-1
(fromlibfabric-1.11.1amzn1.0
) - Open MPI:
openmpi40-aws-4.1.1-2
(fromopenmpi40-aws-4.1.0
)
- EFA configuration:
- Increase timeout when attaching EBS volumes from 3 to 5 minutes.
- Retry
berkshelf
installation up to 3 times. - Root volume size increased from 25GB to 35GB on all AMIs. Minimum root volume size is now 35GB.
- Upgrade Slurm to version 20.11.7.
- Update slurmctld and slurmd systemd unit files according to latest provided by slurm
- Add new SlurmctldParameters, power_save_min_interval=30, so power actions will be processed every 30 seconds
- Specify instance GPU model as GRES GPU Type in gres.conf, instead of previous hardcoded value
Type=tesla
for all GPU
- Upgrade Arm Performance Libraries (APL) to version 21.0.0
- Upgrade NICE DCV to version 2021.1-10557.
- Upgrade NVIDIA driver to version 460.73.01.
- Upgrade CUDA library to version 11.3.0.
- Upgrade NVIDIA Fabric manager to
nvidia-fabricmanager-460
. - Install ParallelCluster AWSBatch CLI in dedicated python3 virtual env.
- Upgrade Python version used in ParallelCluster virtualenvs from version 3.6.13 to version 3.7.10.
- Upgrade Cinc Client to version 16.13.16.
- Upgrade third-party cookbook dependencies:
- apt-7.4.0 (from apt-7.3.0)
- iptables-8.0.0 (from iptables-7.1.0)
- line-4.0.1 (from line-2.9.0)
- openssh-2.9.1 (from openssh-2.8.1)
- pyenv-3.4.2 (from pyenv-3.1.1)
- selinux-3.1.1 (from selinux-2.1.1)
- ulimit-1.1.1 (from ulimit-1.0.0)
- yum-6.1.1 (from yum-5.1.0)
- yum-epel-4.1.2 (from yum-epel-3.3.0)
- Drop
lightdm
package install from Ubuntu 18.04 DCV installation process. - Update default NFS options used by Compute nodes to mount shared filesystem from head node.
- Drop
intr
option, which is deprecated since kernel 2.6.25 - Drop
noatime
option, which is not relevant for NFS mount
- Drop