Slurm#

Slurm (Simple Linux Utility for Resource Management) is an open-source, highly configurable, fault-tolerant, and adaptable workload manager, used extensively in High Performance Computing (HPC) environments.

Slurm is designed to accommodate the complex needs of large-scale computational workloads by efficiently distributing and managing tasks across clusters comprising thousands of nodes, offering seamless control over resources, scheduling, and job queuing. You can also use Slurm on the Discovery cluster for functionalities such as Slurm Jobs Array, Monitoring and Managing Jobs, and check the Query Partitions: sinfo.

Slurm Commands

Basic Slurm commands that are used for running, monitoring, and canceling jobs.

Slurm Running Jobs

Advanced usage and explanation of srun and sbatch for running jobs.

Monitoring and Managing Jobs

Learn the advanced usage and explanation of squeue, scancel, and sinfo for monitoring jobs.

Slurm Jobs Array

An introduction and use cases for Slurm job arrays for launching a large series of jobs.