Releases · aws/aws-parallelcluster-cookbook

Add support for multiple queues and multiple instance types feature with the Slurm scheduler.
Extend NICE DCV support to ARM instances.
Extend support to disable hyperthreading on instances (like *.metal) that don't support CpuOptions in
LaunchTemplate.
Enable support for NFS 4 for the filesystems shared from the head node.
Add script wrapper to support Torque-like commands with the Slurm scheduler.

CHANGES

A Route53 private hosted zone is now created together with the cluster and used in DNS resolution inside cluster nodes
when using Slurm scheduler.
Upgrade EFA installer to version 1.9.5:
- EFA configuration: efa-config-1.4 (from efa-config-1.3)
- EFA profile: efa-profile-1.0.0
- EFA kernel module: efa-1.6.0 (no change)
- RDMA core: rdma-core-28.amzn0 (no change)
- Libfabric: libfabric-1.10.1amazon1.1 (no change)
- Open MPI: openmpi40-aws-4.0.3 (no change)
Upgrade Slurm to version 20.02.4.
Apply the following changes to Slurm configuration:
- Assign a range of 10 ports to Slurmctld in order to better perform with large cluster settings
- Configure cloud scheduling logic
- Set ReconfigFlags=KeepPartState
- Set MessageTimeout=60
- Set TaskPlugin=task/affinity,task/cgroup together with TaskAffinity=no and ConstrainCores=yes in cgroup.conf
Upgrade NICE DCV to version 2020.1-9012.
Use private ip instead of master node hostname when mounting shared NFS drives.
Add new log streams to CloudWatch: chef-client, clustermgtd, computemgtd, slurm_resume, slurm_suspend.
Remove dependency on cfn-init in compute nodes bootstrap.
Add support for queue names in pre/post install scripts.

BUG FIXES

Solve dpkg lock issue with Ubuntu that prevented custom AMI creation in some cases.

Assets 2

04 Aug 15:49

enrico-usai

v2.8.1

0d275a5

AWS ParallelCluster v2.8.1

We're excited to announce the release of AWS ParallelCluster Cookbook 2.8.1.

This is associated with AWS ParallelCluster v2.8.1.

CHANGES

Disable screen lock for DCV desktop sessions to prevent users from being locked out.

Assets 2

24 Jul 00:52

tilne

v2.8.0

e4ecdd5

AWS ParallelCluster v2.8.0

We're excited to announce the release of AWS ParallelCluster Cookbook 2.8.0.

This is associated with AWS ParallelCluster v2.8.0

ENHANCEMENTS

Enable support for ARM instances on Ubuntu 18.04 and Amazon Linux 2.
Install PMIx v3.1.5 and provide slurm support for it on all supported operating systems except for
CentOS 6.
Install glibc-static, which is required to support certain options for the Intel MPI compiler.

CHANGES

Disable libvirtd service on Centos 7. Virtual bridge interfaces are incorrectly detected by Open MPI and
cause MPI applications to hang, see https://www.open-mpi.org/faq/?category=tcp#tcp-selection for details
Use CINC instead of Chef for provisioning instances. See https://cinc.sh/about/ for details.
Retry when mounting an NFS mount fails.
Install the pyenv virtual environments used by ParallelCluster cookbook and node daemon code under
/opt/parallelcluster instead of under /usr/local.
Avoid downloading the source for env2 at installation time.
Drop dependency on the gems ridley and ffi-libarchive.
Vendor cookbooks as part of instance provisioning, rather than doing so before copying the cookbook into an
instance. Users no longer need to have berks installed locally.
Drop the dependencies on the poise-python, tar and hostname third-party cookbooks.
Use the new official CentOS 7 AMI as the base images for ParallelCluster AMI.
Upgrade NVIDIA driver to Tesla version 440.95.01 on CentOS 6 and version 450.51.05 on all other distros.
Upgrade CUDA library to version 11.0 on all distros besides CentOS 6.
Install third-party cookbook dependencies via local source, rather than using the Chef supermarket.
Use https wherever possible in download URLs.
Upgrade EFA installer to version 1.9.4:
- Kernel module: efa-1.6.0 (from efa-1.5.1)
- RDMA core: rdma-core-28.amzn0 (from rdma-core-25.0)
- Libfabric: libfabric-1.10.1amazon1.1 (updated from libfabric-aws-1.9.0amzn1.1)
- Open MPI: openmpi40-aws-4.0.3 (no change)

BUG FIXES

Fix issue that was preventing concurrent use of custom node and pcluster CLI packages.
Use the correct domain name when contacting AWS services from the China partition.
Avoid pinning to a specific release of the Intel HPC platform.

Assets 2

19 May 08:25

ddeidda

v2.7.0

a920830

AWS ParallelCluster v2.7.0

We're excited to announce the release of AWS ParallelCluster Cookbook 2.7.0.

This is associated with AWS ParallelCluster v2.7.0.

CHANGES

Upgrade NICE DCV to version 2020.0-8428.
Upgrade Intel MPI to version U7.
Upgrade NVIDIA driver to version 440.64.00.
Upgrade EFA installer to version 1.8.4:
- Kernel module: efa-1.5.1 (no change)
- RDMA core: rdma-core-25.0 (no change)
- Libfabric: libfabric-aws-1.9.0amzn1.1 (no change)
- Open MPI: openmpi40-aws-4.0.3 (updated from openmpi40-aws-4.0.2)
Upgrade CentOS 7 AMI to version 7.8

BUG FIXES

Fix recipes installation at runtime by adding the bootstrapped file at the end of the last chef run.
Fix installation of Lustre client on Centos 7
FSx Lustre: Exit with error when failing to retrieve FSx mountpoint.

Assets 2

09 Apr 23:02

tilne

v2.6.1

895098a

AWS ParallelCluster v2.6.1

We're excited to announce the release of AWS ParallelCluster 2.6.1.

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

Change ProctrackType from proctrack/gpid to proctrack/cgroup in slurm.conf in order to better handle termination of
stray processes when running MPI applications. This also includes the creation of a cgroup Slurm configuration in
in order to enable the cgroup plugin.
Skip execution, at node bootstrap time, of all those install recipes that are already applied at AMI creation time.
The old behaviour can be restored setting the property "skip_install_recipes" to "no" through extra_json. The old
behaviour is required in case a custom_node_package is specified and could be needed in case custom_cookbook is used
(depending or not if the custom cookbook contains changes into any *_install recipes)
Start CloudWatch agent earlier in the node bootstrapping phase so that cookbook execution failures are correctly
uploaded and are available for troubleshooting.

CHANGES

FSx Lustre: remove x-systemd.requires=lnet.service from mount options in order to rely on default lnet setup
provided by Lustre.
Enforce Packer version to be >= 1.4.0 when building an AMI. This is also required for customers using pcluster createami command.
Remove /tmp/proxy.sh file. Proxy configuration is now written into /etc/profile.d/proxy.sh
Omit cfn-init-cmd and cfn-wire from the files stored in CloudWatch logs.

BUG FIXES

Fix installation of Intel Parallel Studio XE Runtime that requires yum4 since version 2019.5.
Fix compilation of Torque scheduler on Ubuntu 18.04.

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192

Assets 2

26 Feb 20:41

lukeseawalker

v2.6.0

aff412c

AWS ParallelCluster v2.6.0

We're excited to announce the release of AWS ParallelCluster Cookbook 2.6.0.

This is associated with AWS ParallelCluster v2.6.0.

ENHANCEMENTS

Add support for Amazon Linux 2
Install and setup CloudWatch agent for logging capability
Install NICE DCV on Ubuntu 18.04 (this includes ubuntu-desktop, lightdm, mesa-util packages)
Install and setup Amazon Time Sync on all OSs
Enable accounting plugin in Slurm for all OSes. Note: accounting is not enabled nor configured by default
Enable FSx Lustre on Ubuntu 18.04 and Ubuntu 16.04

CHANGES

Upgrade Slurm to version 19.05.5
Upgrade Intel MPI to version U6
Upgrade EFA installer to version 1.8.3:
- Kernel module: efa-1.5.1 (updated from efa-1.4.1)
- RDMA core: rdma-core-25.0 (distributed only) (no change)
- Libfabric: libfabric-aws-1.9.0amzn1.1 (updated from libfabric-aws-1.8.1amzn1.3)
- Open MPI: openmpi40-aws-4.0.2 (no change)
Add SHA256 checksum verification to verify integrity of NICE DCV packages
Install Python 2.7.17 on CentOS 6 and set it as default through pyenv
Install Ganglia from repository on Amazon Linux, Amazon Linux 2, CentOS 6 and CentOS 7
Disable StrictHostKeyChecking for SSH client when target host is inside cluster VPC for all OSs except CentOS 6
Pin Intel Python 2 and Intel Python 3 to version 2019.4
Automatically disable ptrace protection on Ubuntu 18.04 and Ubuntu 16.04 compute nodes when EFA is enabled
Packer version >= 1.4.0 is required for AMI creation

BUG FIXES

Fix issue with slurmd daemon not being restarted correctly when a compute node is rebooted
Fix errors causing Torque not able to locate jobs, setting server_name to fqdn on master node
Fix Torque issue that was limiting the max number of running jobs to the max size of the cluster
Slurm: configured StateSaveLocation and SlurmdSpoolDir directories to be writable only to slurm user

Support

Assets 2

13 Dec 16:34

demartinofra

v2.5.1

80931c2

AWS ParallelCluster v2.5.1

We're excited to announce the release of AWS ParallelCluster Cookbook 2.5.1.

This is associated with AWS ParallelCluster v2.5.1.

Changes

Upgrade NVIDIA driver to Tesla version 440.33.01.
Upgrade CUDA library to version 10.2.
Upgrade EFA installer to version 1.7.1:
- Kernel module: efa-1.4.1
- RDMA core: rdma-core-25.0
- Libfabric: libfabric-aws-1.8.1amzn1.3
- Open MPI: openmpi40-aws-4.0.2

Bug Fixes

Fix installation of NVIDIA drivers on Ubuntu 18.
Fix installation of CUDA toolkit on Centos 6.
Fix installation of Munge on Amazon Linux, Centos 6, Centos 7 and Ubuntu 16.
Export shared directories to all CIDR blocks in a VPC rather than just the first one.

Support

Assets 2

15 Nov 22:38

rexcsn

v2.5.0

1f8ab59

AWS ParallelCluster v2.5.0

We're excited to announce the release of AWS ParallelCluster Cookbook 2.5.0.

This is associated with AWS ParallelCluster v2.5.0.

Enhancements

Install NICE DCV on Centos 7 (this includes Gnome and Xorg packages).
Install Intel Parallel Studio 2019.5 Runtime in Centos 7 AMI and share /opt/intel over NFS.
Add support for Ubuntu 18.

Changes

Remove support for Ubuntu 14.
Upgrade Intel MPI to version U5.
Upgrade EFA Installer to version 1.6.2, this also upgrades Open MPI to 4.0.2.
Upgrade NVIDIA driver to Tesla version 418.87.
Upgrade CUDA library to version 10.1.
Upgrade Slurm to version 19.05.3-2.
Slurm: changed following parameters in global configuration:
- SelectType=cons_tres, SelectTypeParameter=CR_CPU_Memory, GresTypes=gpu: needed to enable support for GPU scheduling.
- EnforcePartLimits=ALL: jobs which exceed a partition's size and/or time limits will be rejected at submission time.
- Removed FastSchedule since deprecated.
- SlurmdTimeout=180, UnkillableStepTimeout=180: to allow scheduler to recover especially when under heavy load.
Echo compute instance type and memory information in COMPUTE_READY message
Changes to sshd config:
- Disable X11Forwarding by default
- Limit SSH Ciphers to
  aes128-cbc,aes192-cbc,aes256-cbc,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
- Limit SSH MACs to hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512,hmac-sha2-256
Increase default root volume to 25GB.
Enable flock user_xattr noatime Lustre options by default everywhere and
x-systemd.automount x-systemd.requires=lnet.service for systemd based systems.
Install EFA in China AMIs.

Bug Fixes

Fix Ganglia not starting on Ubuntu 16
Fix bug that was preventing nodes to mount partitioned EBS volumes.

Support

Assets 2

29 Jul 10:37

demartinofra

v2.4.1

f0b50ba

AWS ParallelCluster v2.4.1

We're excited to announce the release of AWS ParallelCluster Cookbook 2.4.1.

This is associated with AWS ParallelCluster v2.4.1.

Enhancements

Install IntelMPI on Alinux, Centos 7 and Ubuntu 1604
Upgrade EFA to version 1.4.1
Run all node daemons and cookbook recipes in isolated Python virtualenvs. This allows our code to always
run with the required Python dependencies and solves all conflicts and runtime failures that were being
caused by user packages installed in the system Python

Changes

Torque: upgrade to version 6.1.2
Run all node daemons with Python 3.6
Torque: changed following parameters in global configuration:
- server node_check_rate = 120 - Specifies the minimum duration (in seconds) that a node can fail to send a status update before being marked down by the pbs_server daemon. Previously was 600. This reduces scaling reaction times in case of instance failure or unexpected termination (especially with spot)
- server node_ping_rate = 60 - Specifies the maximum interval (in seconds) between successive "pings" sent from the pbs_server daemon to the pbs_mom daemon to determine node/daemon health. Previously was 300. Setting it to half the node_check_rate.
- server timeout_for_job_delete = 30 - The specific timeout used when deleting jobs because the node they are executing on is being deleted. Previously was 120. This prevents job deletion to hang for more than 30 seconds when the node they are running on is being deleted.
- server timeout_for_job_requeue = 30 - The specific timeout used when requeuing jobs because the node they are executing on is being deleted. Previously was 120. This prevents node deletion to hang for more than 30 seconds when a job cannot be rescheduled.

Bug Fixes

Restore correct value for filehandle_limit that was getting reset when setting memory_limit for EFA
Torque: fix configuration of server operators that was preventing compute nodes from disabling themselves
before termination

Support

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CHANGES

Uh oh!

Uh oh!

CHANGES

Uh oh!

Uh oh!

CHANGES

BUG FIXES

Uh oh!

Upgrade

ENHANCEMENTS

CHANGES

BUG FIXES

Support

Uh oh!

ENHANCEMENTS

CHANGES

BUG FIXES

Support

Uh oh!

Changes

Bug Fixes

Support

Uh oh!

Enhancements

Changes

Bug Fixes

Support

Uh oh!

Enhancements

Changes

Bug Fixes

Support

Uh oh!

Releases: aws/aws-parallelcluster-cookbook

AWS ParallelCluster v2.9.1

CHANGES

Uh oh!

AWS ParallelCluster v2.9.0

Uh oh!

AWS ParallelCluster v2.8.1

CHANGES

Uh oh!

AWS ParallelCluster v2.8.0

Uh oh!

AWS ParallelCluster v2.7.0

CHANGES

BUG FIXES

Uh oh!

AWS ParallelCluster v2.6.1

Upgrade

ENHANCEMENTS

CHANGES

BUG FIXES

Support

Uh oh!

AWS ParallelCluster v2.6.0

ENHANCEMENTS

CHANGES

BUG FIXES

Support

Uh oh!

AWS ParallelCluster v2.5.1

Changes

Bug Fixes

Support

Uh oh!

AWS ParallelCluster v2.5.0

Enhancements

Changes

Bug Fixes

Support

Uh oh!

AWS ParallelCluster v2.4.1

Enhancements

Changes

Bug Fixes

Support

Uh oh!