Hardware Overview#

The computing cluster provides access to over 1,024 CPU nodes, 50,000 CPU cores, and over 200 GPUs and is connected to the university network over 10 Gbps Ethernet (GbE) for high-speed data transfer. Compute nodes are wired with 10 GbE or a high-performance HDR200 InfiniBand (IB) interconnect running at 200 Gbps (with some nodes running HDR100 IB if the HDR200 IB is not supported).

CPU nodes#

Table 1 below shows the feature names, number of nodes by partition type (public and private), and the RAM memory range per node. The feature name follows archspec microarchitecture specification.

Feature Name

Number of Nodes - public, private

RAM memory per node

skylake

0, 170

186 - 3094 GB

zen2

40, 292

256 - 2000 GB

zen

40, 300

256 - 2000 GB

ivybridge

64, 130

31 - 1031 GB

sandybridge

8, 0

384 GB

haswell

230, 62

109 - 1031 GB

broadwell

756, 226

128 - 515 GB

cascadelake

260, 88

186 - 3094 GB

If you are looking for information about GPUs, see Working with GPUs.

If you are interested in more information about the different partitions on Discovery, including the number of nodes per partition, running time limits, job submission limits, and RAM limits, see partition-names.

Using the --constraint flag#

When using srun or sbatch, you can specify hardware features as part of your job by using the --constraint= flag. This may be particularly useful when benchmarking, optimizing, or if you are using code that was compiled on a certain micro-architecture. Currently, you can use the --constraint= flag to restrict your job to a specific feature name (e.g., haswell, ivybridge) or you can use the flag: ib to only include nodes that are connected by InfiniBand (IB) with a job that needs to use multiple nodes.

A few examples using srun:

1     srun --constraint=haswell --pty /bin/bash
2     srun --constraint=ivybridge --pty /bin/bash
3     srun --constraint=ib --pty /bin/bash
4     srun --constraint="[ivybridge|zen2]" --pty /bin/bash #this uses the OR operator | to select either an ivybridge or zen2 node.

You can add these same flags as an additional line in your sbatch script via (#SBATCH --constraint=haswell)

Note

Using the –constraint flag can mean that you will wait longer for your job to start, as the scheduler (Slurm) will need to find and allocate the appropriate hardware that you have specified for your job. For more information about running jobs, see using-slurm. Finally, at this time only the OR operator | is supported when using --constraint.