Graphical processing units (GPUs) are accelerators that may be used to speed up certain operations. GPUs are especially good at linear algebra type computations such as matrix multiplication.

Software must be written specifically to take advantage of one or more GPUs. You should only allocate GPUs for software that you know can take advantage of the GPU.

Types of GPUs¶

We currently provide access to two types of GPUs:

	Nvidia H200 SXM 141GB	Nvidia L40S 48GB
Partition	`gpu-h200`	`gpu-l40s` or
		`gpu-short`
Nodes/GPUs	2 nodes x 4 GPUs	2 nodes x 8 GPUs
Architecture	Hopper	Ada Lovelace
VRAM	141 GB	48 GB
Memory Bandwidth	4800 GB/s	864 GB/s
TDP	700 W	350 W
FP64 Performance	34.0 TFLOPS	1.4 TFLOPS
FP32 Performance	67.0 TFLOPS	91.6 TFLOPS
FP16 Performance	989.5 TFLOPS	733.0 TFLOPS
BF16 Performance	989.5 TFLOPS	733.0 TFLOPS
FP8 Performance	1979.0 TFLOPS	733.0 TFLOPS
INT8 Performance	1979.0 TOPS	733.0 TOPS

You can read more about the exact machine specifications here and see pricing for the different GPU types here. GPUs are also subject to resource limits.

Which one to use very much depends on the software you’re using and the computation you are running. In general, the L40S is good for inference and small simulations, while the H200 is good for model training and large simulations due to the larger amount of memory, but you should always benchmark your specific application to find the most appropriate fit.

Be aware that GPUs are much, much more expensive than CPUs! For example, one hour on an Nvidia L40S is 50x the cost of one hour on a CPU-core. You should always consider whether the speed-up gained from the GPU is worth the cost.

Requesting GPUs¶

To to run a job on a node with a GPU device you need to submit it to the right partition and specify how many GPU devices you are going to use.

For example, to submit an interactive job with one Nvidia L40S GPU allocated:

[fe-open-01]$ srun --gpus 1 -p gpu-l40s --pty bash

Or to submit an interactive job with two Nvidia H200 GPUs allocated:

[fe-open-01]$ srun --gpus 2 -p gpu-h200 --pty bash

Note that the software you’re using must support and be configued to use multiple GPUs, otherwise allocating more GPUs will not make a difference.

If you really don’t care which type of GPU you get, you can specify both partitions:

[fe-open-01]$ srun --gpus 2 -p gpu-l40s,gpu-h200 --pty bash

In a batch script it looks like this. Here we ask for four Nvidia L40S GPUs:

#!/bin/bash
#SBATCH --account my_project
#SBATCH -c 8
#SBATCH --mem 16g
#SBATCH --partition gpu-l40s
#SBATCH --gpus 4
#SBATCH --time 04:00:00

echo hello world

Monitoring GPU utilization¶

GPU jobs that, after running for two hours, have an average GPU utilization of less than 75% are automatically cancelled!

GPU-time is exceedingly expensive, so you should make sure that you are utilizing the GPU well. You can see the average GPU utilization for your job using jobinfo:

[fe-open-01]$ jobinfo <job id>
...
GPUs                : 4
...
GPU utilization     : 3.68 GPUs (92%)

In this, four GPUs were requested for the job and it used on average 3.68 GPUs, which becomes a utilization of 92%. You may use jobinfo to check the utilization as the job runs.

Alternatively, you can connect to the running job and use the nvidia-smi or nvtop commands to get GPU and memory utilization:

[fe-open-01]$ srun --jobid <job id> --overlap --pty bash
[gn-1001]$ nvidia-smi # or...
[gn-1001]$ nvtop

Staging data¶

If utilization is low or varies a lot on your test jobs (you can see this if you run nvtop as it shows a nice utilization curve), you should probably consider staging the data to local disk as a first step to increase utilization:

#!/bin/bash
#SBATCH --account my_project
#SBATCH -c 8
#SBATCH --mem 16g
#SBATCH --partition gpu-l40s
#SBATCH --gpus 1
#SBATCH --time 04:00:00

cp -r path-to-your-data-on-faststorage/ $TMPDIR/
# change paths to refer to $TMPDIR
some-command $TMPDIR/input.dat

Reading from local disk will provide much higher and more stable access to data. You can see the size of the local disk on each type of compute node on the hardware page.

Computing with GPUs

Types of GPUs¶

Requesting GPUs¶

Monitoring GPU utilization¶

Staging data¶