Slurm Running Jobs#
You have two options when running tasks, run interactively via Interactive Jobs: srun Command or by batch (passively) via Batch Jobs: sbatch Command.
Interactive Jobs: srun
Command#
The srun
command is used to submit an interactive job, which runs in a shell terminal. This method is useful when you want to test a short computation or run an interactive session like a shell, Python, or an R terminal.
To start an srun
session, the following syntex is used from the login node:
srun [options] [command]
The following is an example of an srun
job running on 1 node, with 1 task:
srun -N 1 -n 1 -p short --pty bash
n, --ntasks=<number>
: specify the number of tasksN, --nodes=<minnodes[-maxnodes]>
: specify the number of nodesp, --partition=<partition-name>
: specify a partition for the job to run on
To see all options for srun
, please refer to srun manual from Schedmd.
Examples Using srun
#
You can tailor your request to fit both the needs of the job and the partition limits if you’re familiar with the available hardware and partitions on Discovery.
To request one node and one task for 30 minutes with X11 forwarding on the short partition, type:
srun --partition=short --nodes=1 --ntasks=1 --x11 --mem=2G --time=00:30:00 --pty /bin/bash
To request one node, with 10 tasks and 2 CPUs per task (a total of 20 CPUs), 40 GB of memory, for one hour on the express partition, type:
srun --partition=short --nodes 1 --ntasks 10 --cpus-per-task 2 --pty --mem=40G --time=01:00:00 /bin/bash
To request 2 nodes, each with 10 tasks per node and 2 CPUs per task (a total of 40 CPUs), 80 GB of memory, for one hour on the express partition, type:
srun --partition=short --nodes=2 --ntasks 10 --cpus-per-task 2 --pty --mem=80G --time=01:00:00 /bin/bash
To allocate a GPU node, you should specify the gpu
partition and use the –gres
option:
srun --partition=gpu --nodes=1 --ntasks=1 --gres=gpu:1 --mem=2Gb --time=01:00:00 --pty /bin/bash
Batch Jobs: sbatch
Command#
The sbatch
command is used to submit a job script for passive execution. The script includes the SBATCH
directives that control the job parameters (e.g., number of nodes, CPUs per task, job name). To submit the batch jobs, the following is run from the login node:
sbatch [options] <script_file>
An example sbatch script for a job utilizing 2 nodes and 16 tasks per node:
#!/bin/bash
#SBATCH -J MyJob # Job name
#SBATCH -N 2 # Number of nodes
#SBATCH -n 16 # Number of tasks
#SBATCH -o output_%j.txt # Standard output file
#SBATCH -e error_%j.txt # Standard error file
# Your program/command here
./my_program
To submit this job script, save it as my_job.sh
and run:
sbatch my_job.sh
For more information on the SBATCH
directives that can be used in the script, please refer to the sbatch manual from Schedmd.
Request a specific amount of memory in the job script if calculations require more than the default 2GB per allocated code. The example script below requests 100GB of memory (--mem=100G
). Use one capital letter to abbreviate the unit of memory (i.e., kilo K
, mega M
, giga G
, and tera T
) with the --mem=
option, as that is what Slurm expects to see:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --time=4:00:00
#SBATCH --job-name=MyJobName
#SBATCH --mem=100G
#SBATCH --partition=short
# <commands to execute>
If you need exclusive use of a node (i.e., you have a job that has high I/O requirements), you can use the exclusive flag. The example script below specifies the exclusive use of one node in the short partition for four hours:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --time=4:00:00
#SBATCH --job-name=MyJobName
#SBATCH --exclusive
#SBATCH --partition=short
# <commands to execute>