Releases: aws/aws-parallelcluster-cookbook
AWS ParallelCluster v3.4.1
We're excited to announce the release of AWS ParallelCluster Cookbook 3.4.1
This is associated with AWS ParallelCluster v3.4.1
BUG FIXES
- Fix an issue with the Slurm scheduler that might incorrectly apply updates to its internal registry of compute nodes. This might result in EC2 instances to become inaccessible or backed by an incorrect instance type.
AWS ParallelCluster v3.4.0
We're excited to announce the release of AWS ParallelCluster Cookbook 3.4.0
This is associated with AWS ParallelCluster v3.4.0
ENHANCEMENTS
- Add support for specifying multiple subnets for each queue to increase the EC2 capacity pool available for use.
CHANGES
- Upgrade EFA installer to
1.20.0
- Efa-driver:
efa-2.1
- Efa-config:
efa-config-1.11-1
- Efa-profile:
efa-profile-1.5-1
- Libfabric-aws:
libfabric-aws-1.16.1
- Rdma-core:
rdma-core-43.0-2
- Open MPI:
openmpi40-aws-4.1.4-3
- Efa-driver:
- Mount EFS file systems using
amazon-efs-utils
. EFS files systems can be mounted using in-transit encryption and IAM authorized user. - Install
stunnel
5.67 on CentOS7 and Ubuntu to support EFS in-transit encryption. - Add possibility to execute a custom script in the head node during the update of the cluster.
- Upgrade Slurm to version 22.05.6.
- Upgrade Python to 3.9.16 and 3.7.16.
AWS ParallelCluster v2.11.9
We're excited to announce the release of AWS ParallelCluster Cookbook 2.11.9
This is associated with AWS ParallelCluster v2.11.9
CHANGES
- There were no notable changes for this version.
AWS ParallelCluster v3.3.1
We're excited to announce the release of AWS ParallelCluster Cookbook 3.3.1
This is associated with AWS ParallelCluster v3.3.1
CHANGES
- There were no changes for this version.
AWS ParallelCluster v3.1.5
We're excited to announce the release of AWS ParallelCluster Cookbook 3.1.5
This is associated with AWS ParallelCluster v3.1.5
CHANGES
- Upgrade EFA installer to
1.18.0
- Efa-driver:
efa-1.16.0-1
- Efa-config:
efa-config-1.11-1
- Efa-profile:
efa-profile-1.5-1
- Libfabric-aws:
libfabric-aws-1.16.0~amzn4.0-1
- Rdma-core:
rdma-core-41.0-2
- Open MPI:
openmpi40-aws-4.1.4-2
- Efa-driver:
- Upgrade Intel MPI Library to 2021.6.0.602.
- Upgrade NVIDIA driver to version 470.141.03.
- Upgrade NVIDIA Fabric Manager to version 470.141.03.
BUG FIXES
- Fix Slurm issue that prevents idle nodes termination.
AWS ParallelCluster v2.11.8
We're excited to announce the release of AWS ParallelCluster Cookbook 2.11.8
This is associated with AWS ParallelCluster v2.11.8
CHANGES
- Upgrade Intel MPI Library to 2021.6.0.602.
- Upgrade EFA installer to
1.19.0
- Efa-driver:
efa-1.16.0-1
- Efa-config:
efa-config-1.11-1
- Efa-profile:
efa-profile-1.5-1
- Libfabric-aws:
libfabric-aws-1.16.0-1
- Rdma-core:
rdma-core-41.0-2
- Open MPI:
openmpi40-aws-4.1.4-3
- Efa-driver:
AWS ParallelCluster v3.3.0
We're excited to announce the release of AWS ParallelCluster Cookbook 3.3.0
This is associated with AWS ParallelCluster v3.3.0
ENHANCEMENTS
- Add support for Slurm Accounting.
- Add support for adding and removing shared storages at cluster update.
- Add possibility to specify multiple instance types for the same compute resource.
- Configure NFS threads to be
min(256, max(8, num_cores * 4))
to ensure better stability and performance. - Move NFS installation at build time to reduce configuration time.
CHANGES
- Upgrade NVIDIA driver to version 470.141.03.
- Upgrade NVIDIA Fabric Manager to version 470.141.03.
- Upgrade NVIDIA CUDA Toolkit to version 11.7.1.
- Disable cron job tasks man-db and mlocate, which may have a negative impact on node performance.
- Reduce timeout from 50 to a maximum of 5min in case of DynamoDB connection issues at compute node bootstrap.
- Change the logic to number the routing tables when an instance have multiple NICs.
- Upgrade Python from 3.7.13 to 3.9.15.
- Upgrade Slurm to version 22.05.5.
- Upgrade EFA installer to
1.18.0
.- Efa-driver:
efa-1.16.0-1
- Efa-config:
efa-config-1.11-1
- Efa-profile:
efa-profile-1.5-1
- Libfabric-aws:
libfabric-aws-1.16.0~amzn4.0-1
- Rdma-core:
rdma-core-41.0-2
- Open MPI:
openmpi40-aws-4.1.4-2
- Efa-driver:
- Upgrade NICE DCV to version
2022.1-13300
.- server:
2022.1.13300-1
- xdcv:
2022.1.433-1
- gl:
2022.1.973-1
- web_viewer:
2022.1.13300-1
- server:
- Upgrade third-party cookbook dependencies:
- selinux-6.0.5 (from selinux-6.0.4)
- nfs-5.0.0 (from nfs-2.6.4)
AWS ParallelCluster v3.2.1
We're excited to announce the release of AWS ParallelCluster Cookbook 3.2.1
This is associated with AWS ParallelCluster v3.2.1
ENHANCEMENTS
- Improve the logic to associate the host routing tables to the different network cards to better support EC2 instances with several NICs.
CHANGES
- Upgrade NVIDIA driver to version 470.141.03.
- Upgrade NVIDIA Fabric Manager to version 470.141.03.
- Pin cfn-bootstrap helper package version to 2.0-10
- Disable cron job tasks man-db and mlocate, which may have a negative impact on node performance.
- Upgrade Intel MPI Library to 2021.6.0.602.
- Upgrade Python from 3.7.10 to 3.7.13 in response to this security risk.
AWS ParallelCluster v3.2.0
We're excited to announce the release of AWS ParallelCluster Cookbook 3.2.0
This is associated with AWS ParallelCluster v3.2.0
ENHANCEMENTS
- Add support for multiple Elastic File Systems.
- Add support for multiple FSx File System.
- Add support for attaching existing FSx for Ontap and FSx for OpenZFS File Systems.
- Install NVIDIA GDRCopy 2.3 to enable low-latency GPU memory copy on supported instance types.
- During cluster update set Slurm nodes state accordingly to strategy set through the configuration parameter
Scheduling/SchedulerSettings/QueueUpdateStrategy
. - Add support for memory-based scheduling in Slurm.
- Configure
RealMemory
on compute nodes by default as 95% of the EC2 memory. - Move
SelectTypeParameters
toslurm_parallelcluster.conf
include file. - Move
ConstrainRAMSpace
toslurm_parallelcluster_cgroup.conf
include file. - Add support for new configuration parameter
Scheduling/SlurmSettings/EnableMemoryBasedScheduling
to configure memory-based scheduling in Slurm. - Add support for new configuration parameter
Scheduling/SlurmQueues/ComputeResources/SchedulableMemory
to override default value of the memory seen by the scheduler on compute nodes.
- Configure
- Add support for rebooting compute nodes via Slurm.
CHANGES
- Restart
clustermgtd
andslurmctld
daemons at cluster update time only whenScheduling
parameters are updated in the cluster configuration. - Update slurmctld and slurmd systemd service files.
- Upgrade NICE DCV to version 2022.0-12760.
- Upgrade NVIDIA driver to version 470.129.06.
- Upgrade NVIDIA Fabric Manager to version 470.129.06.
- Upgrade EFA installer to version 1.17.2.
- EFA driver:
efa-1.16.0-1
- EFA configuration:
efa-config-1.10-1
- EFA profile:
efa-profile-1.5-1
- Libfabric:
libfabric-aws-1.16.0~amzn2.0-1
- RDMA core:
rdma-core-41.0-2
- Open MPI:
openmpi40-aws-4.1.4-2
- EFA driver:
- Restrict IPv6 access to IMDS to root and cluster admin users only, when configuration parameter
HeadNode/Imds/Secured
is enabled. - Set Slurm configuration
AuthInfo=cred_expire=70
to reduce the time requeued jobs must wait before starting again when nodes are not available. - Move
SelectTypeParameters
andConstrainRAMSpace
to theparallelcluster_slurm*.conf
include files. - Upgrade third-party cookbook dependencies:
- apt-7.4.2 (from apt-7.4.0)
- line-4.5.2 (from line-4.0.1)
- openssh-2.10.3 (from openssh-2.9.1)
- pyenv-3.5.1 (from pyenv-3.4.2)
- selinux-6.0.4 (from selinux-3.1.1)
- yum-7.4.0 (from yum-6.1.1)
- yum-epel-4.5.0 (from yum-epel-4.1.2)
- Disable
aws-ubuntu-eni-helper
service, available in Deep Learning AMIs, to avoid conflicts withconfigure_nw_interface.sh
when configuring instances with multiple network cards. - Set MTU to 9001 for all the network interfaces when configuring instances with multiple network cards.
- Remove the trailing dot when configuring the compute node FQDN.
AWS ParallelCluster v3.1.4
We're excited to announce the release of AWS ParallelCluster Cookbook 3.1.4
This is associated with AWS ParallelCluster v3.1.4
CHANGES
- Upgrade Slurm to version 21.08.8-2.
ENHANCEMENTS
- Add support for enabling JWT authentication Slurm.