Interactive and Batch Mode¶
In our High-Performance Computing (HPC) environment, users can run jobs in two primary modes: Interactive and Batch. This page provides an in-depth guide to both, assisting users in selecting the appropriate mode for their specific tasks.
Interactive Jobs: srun
Command¶
The srun
command is used to submit an interactive job, which runs in a shell terminal. This method is useful when you want to test a short computation or run an interactive session like a shell, Python, or an R terminal.
To start an srun
session, the following syntax is used from the login node:
srun [options] [command]
The following is an example of an srun
job running on 1 node, with 1 task:
srun -N 1 -n 1 -p short --pty bash
n, --ntasks=<number>
: specify the number of tasksN, --nodes=<minnodes[-maxnodes]>
: specify the number of nodesp, --partition=<partition-name>
: specify a partition for the job to run on
To see all options for srun
, please refer to srun manual from SchedMD.
Examples Using srun
¶
You can tailor your request to fit both the needs of the job and the partition limits if you’re familiar with the available hardware and partitions on Discovery.
To request one node and one task for 30 minutes with X11 forwarding (check that you have X11 forwarding setup) on the short partition, type:
srun --partition=short --nodes=1 --ntasks=1 --x11 --mem=2G --time=00:30:00 --pty /bin/bash
To request one node, with 10 tasks and 2 CPUs per task (a total of 20 CPUs), 40 GB of memory, for one hour on the express partition, type:
srun --partition=short --nodes 1 --ntasks 10 --cpus-per-task 2 --pty --mem=40G --time=01:00:00 /bin/bash
To request 2 nodes, each with 10 tasks per node and 2 CPUs per task (a total of 40 CPUs), 80 GB of memory, for one hour on the express partition, type:
srun --partition=short --nodes=2 --ntasks 10 --cpus-per-task 2 --pty --mem=80G --time=01:00:00 /bin/bash
To allocate a GPU node, you should specify the gpu
partition and use the --gres
option:
srun --partition=gpu --nodes=1 --ntasks=1 --gres=gpu:1 --mem=2Gb --time=01:00:00 --pty /bin/bash
Batch Jobs: sbatch
Command¶
The sbatch
command is used to submit a job script for passive execution. The script includes the SBATCH
directives that control the job parameters (e.g., number of nodes, CPUs per task, job name). A node is a single machine in the cluster allocated for computation, while a task is a unit of parallel work required from within the job. Not all programs are optimized to run on more than one node. We recommmend testing if increasing the number of tasks decreases job runtime prior to testing if increasing the number of nodes decreases runtime.
Important
Remember for all requests to the scheduler, the more resources requested the longer your job may sit in the queue waiting for the allocation of those resources.
To submit the batch jobs, the following is run from the login node:
sbatch [options] <script_file>
An example sbatch script for a job utilizing 2 nodes and 16 tasks per node:
Note
Only use more than 1 node if your program is optimized for multi-nodal execution.
#!/bin/bash
#SBATCH -J MyJob # Job name
#SBATCH -N 2 # Number of nodes
#SBATCH -n 16 # Number of tasks
#SBATCH -o output_%j.txt # Standard output file
#SBATCH -e error_%j.txt # Standard error file
#SBATCH [email protected] # Email
#SBATCH --mail-type=ALL # Type of email notifications
# Your program/command here
./my_program
To submit this job script, save it as my_job.sh
and run:
sbatch my_job.sh
For more information on the SBATCH
directives that can be used in the script, please refer to the sbatch manual from Schedmd.
Request a specific amount of memory in the job script if calculations require more than the default 2GB per allocated code. The example script below requests 100GB of memory (--mem=100G
). Use one capital letter to abbreviate the unit of memory (i.e., kilo K
, mega M
, giga G
, and tera T
) with the --mem=
option, as that is what Slurm expects to see:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --time=4:00:00
#SBATCH --job-name=MyJobName
#SBATCH --mem=100G
#SBATCH --partition=short
# <commands to execute>