Skip to content

AWS ParallelCluster v2.11.0

Compare
Choose a tag to compare
@enrico-usai enrico-usai released this 01 Jul 04:01
· 78 commits to release-2.11 since this release

We're excited to announce the release of AWS ParallelCluster Cookbook 2.11.0

This is associated with AWS ParallelCluster v2.11.0

ENHANCEMENTS

  • Add support for Ubuntu 20.04.
  • Add support for using FSx Lustre in subnet with no internet access.
  • Add support for building custom Centos 7 AMIs on ARM.
  • Make sure slurmd service is only enabled after post-install process, which will prevent user from unintentionally making compute node available during post-install process.
  • Change to ssh_target_checker.sh syntax that makes the script compatible with pdsh.
  • Add possibility to use a post installation script when building Centos 8 AMI.
  • Install SSM agent on CentOS 7 and 8.
  • Transition from IMDSv1 to IMDSv2.
  • Add support for security_group_id in packer custom builders. Customers can export AWS_SECURITY_GROUP_ID environment variable to specify security group for custom builders when building custom AMIs.
  • Configure the following default gc_thresh values for performance at scale.
    • net.ipv4.neigh.default.gc_thresh1 = 0
    • net.ipv4.neigh.default.gc_thresh2 = 15360
    • net.ipv4.neigh.default.gc_thresh3 = 16384

CHANGES

  • Ubuntu 16.04 is no longer supported.
  • Amazon Linux is no longer supported.
  • Upgrade EFA installer to version 1.12.2
    • EFA configuration: efa-config-1.8-1 (from efa-config-1.7)
    • EFA profile: efa-profile-1.5-1 (from efa-profile-1.4)
    • EFA kernel module: efa-1.12.3 (from efa-1.10.2)
    • RDMA core: rdma-core-32.1amzn (from rdma-core-31.2amzn)
    • Libfabric: libfabric-1.11.2amzon1.1-1 (from libfabric-1.11.1amzn1.0)
    • Open MPI: openmpi40-aws-4.1.1-2 (from openmpi40-aws-4.1.0)
  • Increase timeout when attaching EBS volumes from 3 to 5 minutes.
  • Retry berkshelf installation up to 3 times.
  • Root volume size increased from 25GB to 35GB on all AMIs. Minimum root volume size is now 35GB.
  • Upgrade Slurm to version 20.11.7.
    • Update slurmctld and slurmd systemd unit files according to latest provided by slurm
    • Add new SlurmctldParameters, power_save_min_interval=30, so power actions will be processed every 30 seconds
    • Specify instance GPU model as GRES GPU Type in gres.conf, instead of previous hardcoded value Type=tesla for all GPU
  • Upgrade Arm Performance Libraries (APL) to version 21.0.0
  • Upgrade NICE DCV to version 2021.1-10557.
  • Upgrade NVIDIA driver to version 460.73.01.
  • Upgrade CUDA library to version 11.3.0.
  • Upgrade NVIDIA Fabric manager to nvidia-fabricmanager-460.
  • Install ParallelCluster AWSBatch CLI in dedicated python3 virtual env.
  • Upgrade Python version used in ParallelCluster virtualenvs from version 3.6.13 to version 3.7.10.
  • Upgrade Cinc Client to version 16.13.16.
  • Upgrade third-party cookbook dependencies:
    • apt-7.4.0 (from apt-7.3.0)
    • iptables-8.0.0 (from iptables-7.1.0)
    • line-4.0.1 (from line-2.9.0)
    • openssh-2.9.1 (from openssh-2.8.1)
    • pyenv-3.4.2 (from pyenv-3.1.1)
    • selinux-3.1.1 (from selinux-2.1.1)
    • ulimit-1.1.1 (from ulimit-1.0.0)
    • yum-6.1.1 (from yum-5.1.0)
    • yum-epel-4.1.2 (from yum-epel-3.3.0)
  • Drop lightdm package install from Ubuntu 18.04 DCV installation process.
  • Update default NFS options used by Compute nodes to mount shared filesystem from head node.
    • Drop intr option, which is deprecated since kernel 2.6.25
    • Drop noatime option, which is not relevant for NFS mount