Slurm#

Slurm (Simple Linux Utility for Resource Management) is an open-source, highly configurable, fault-tolerant, and adaptable workload manager. It is extensively used across High-Performance Computing (HPC) environments.

Slurm is designed to accommodate the complex needs of large-scale computational workloads. It can efficiently distribute and manage tasks across clusters comprising thousands of nodes, offering seamless control over resources, scheduling, and job queuing. It is the software on the HPC that provides functionalities such as Slurm Array Jobs and Dependencies, Monitoring and Managing Jobs, view Account information, and check the Cluster and Node States: sinfo.

Basic Slurm Usage#

This page provides the basic slurm commands used for running, monitoring, and canceling jobs.

This page provides advanced usage and explanation of srun and sbatch for running jobs.

This page provides advanced usage and explanation of squeue, scontrol, and sinfo for monitoring jobs.

Advanced Slurm Usage#

This page provides an introduction and use cases for slurm job arrays for launching a large series of jobs.

This page provides the best practices for slurm HPC usage when submitting jobs.