Release AWS ParallelCluster v2.10.0 · aws/aws-parallelcluster-cookbook

We're excited to announce the release of AWS ParallelCluster Cookbook 2.10.0

This is associated with AWS ParallelCluster v2.10.0.

ENHANCEMENTS

Add support for CentOS 8.
Add support for instance types with multiple network cards (e.g. p4d.24xlarge).
Enable FSx Lustre in China regions.
Add validation step for AMI creation process to fail when using a base AMI created by a different version of
ParallelCluster.
Add validation step for AMI creation process to fail if the selected OS and the base AMI OS are not consistent.
Add possibility to use a post installation script when building an AMI.
Install NVIDIA Fabric manager to enable NVIDIA NVSwitch on supported platforms.

CHANGES

Upgrade EFA installer to version 1.10.1
- EFA configuration: efa-config-1.5 (from efa-config-1.4)
- EFA profile: efa-profile-1.1 (from efa-profile-1.0.0)
- EFA kernel module: efa-1.10.2 (from efa-1.6.0)
- RDMA core: rdma-core-31.amzn0 (from rdma-core-28.amzn0)
- Libfabric: libfabric-1.11.1amzn1.1 (from libfabric-1.10.1amzn1.1)
- Open MPI: openmpi40-aws-4.0.5 (from openmpi40-aws-4.0.3)
- Unifies installer runtime options across x86 and aarch64
- Introduces -g/--enable-gdr switch to install packages with GPUDirect RDMA support
- Updates to OMPI collectives decision file packaging, migrated from efa-config to efa-profile
- Introduces CentOS 8 support
CentOS 6 is no longer supported.
Upgrade NVIDIA driver to version 450.80.02.
Upgrade Intel Parallel Studio XE Runtime to version 2020.2.
Upgrade Munge to version 0.5.14.
Retrieve FSx Lustre DNS name dynamically.
Slurm: change SlurmctldPort to 6820-6829 to not overlap with default slurmdbd port (6819).
Slurm: add compute_resource name and efa as node features.
Improve Slurm and Munge installation process by cleaning up existing installations from OS repositories.
Install Python 3 version of aws-cfn-bootstrap scripts.
Do not force compute fleet into STOPPED state when performing a cluster update. This allows to update the queue
size without forcing a termination of the existing instances.

BUG FIXES

Fix ephemeral drives setup to avoid failures when partition changes require a reboot.
Fix Chrony service management.
Retrieve the right number of compute instance slots when instance type is updated.
Fix compute fleet status initialization to be configured before daemons are started by supervisord.

Provide feedback