Computing with GPUs

Graphical processing units (GPUs) are accelerators that may be used to speed up certain operations. GPUs are especially good at linear algebra type computations such as matrix multiplication.

Software must be written specifically to take advantage of one or more GPUs. You should only allocate GPUs for software that you know can take advantage of the GPU.

Types of GPUs

We currently provide access to two types of GPUs:

Nvidia H200 SXM 141GBNvidia L40S 48GB
Partitiongpu-h200gpu-l40s or
gpu-short
Nodes/GPUs2 nodes x 4 GPUs2 nodes x 8 GPUs
ArchitectureHopperAda Lovelace
VRAM141 GB48 GB
Memory Bandwidth4800 GB/s864 GB/s
TDP700 W350 W
FP64 Performance34.0 TFLOPS1.4 TFLOPS
FP32 Performance67.0 TFLOPS91.6 TFLOPS
FP16 Performance989.5 TFLOPS733.0 TFLOPS
BF16 Performance989.5 TFLOPS733.0 TFLOPS
FP8 Performance1979.0 TFLOPS733.0 TFLOPS
INT8 Performance1979.0 TOPS733.0 TOPS

You can read more about the exact machine specifications here and see pricing for the different GPU types here. GPUs are also subject to resource limits.

Which one to use very much depends on the software you’re using and the computation you are running. In general, the L40S is good for inference and small simulations, while the H200 is good for model training and large simulations due to the larger amount of memory, but you should always benchmark your specific application to find the most appropriate fit.

Be aware that GPUs are much, much more expensive than CPUs! For example, one hour on an Nvidia L40S is 50x the cost of one hour on a CPU-core. You should always consider whether the speed-up gained from the GPU is worth the cost.

Requesting GPUs

To to run a job on a node with a GPU device you need to submit it to the right partition and specify how many GPU devices you are going to use.

For example, to submit an interactive job with one Nvidia L40S GPU allocated:

[fe-open-01]$ srun --gpus 1 -p gpu-l40s --pty bash

Or to submit an interactive job with two Nvidia H200 GPUs allocated:

[fe-open-01]$ srun --gpus 2 -p gpu-h200 --pty bash

Note that the software you’re using must support and be configued to use multiple GPUs, otherwise allocating more GPUs will not make a difference.

If you really don’t care which type of GPU you get, you can specify both partitions:

[fe-open-01]$ srun --gpus 2 -p gpu-l40s,gpu-h200 --pty bash

In a batch script it looks like this. Here we ask for four Nvidia L40S GPUs:

#!/bin/bash
#SBATCH --account my_project
#SBATCH -c 8
#SBATCH --mem 16g
#SBATCH --partition gpu-l40s
#SBATCH --gpus 4
#SBATCH --time 04:00:00

echo hello world

Monitoring GPU utilization

GPU jobs that, after running for two hours, have an average GPU utilization of less than 75% are automatically cancelled!

GPU-time is exceedingly expensive, so you should make sure that you are utilizing the GPU well. You can see the average GPU utilization for your job using jobinfo:

[fe-open-01]$ jobinfo <job id>
...
GPUs                : 4
...
GPU utilization     : 3.68 GPUs (92%)

In this, four GPUs were requested for the job and it used on average 3.68 GPUs, which becomes a utilization of 92%. You may use jobinfo to check the utilization as the job runs.

Alternatively, you can connect to the running job and use the nvidia-smi or nvtop commands to get GPU and memory utilization:

[fe-open-01]$ srun --jobid <job id> --overlap --pty bash
[gn-1001]$ nvidia-smi # or...
[gn-1001]$ nvtop

Staging data

If utilization is low or varies a lot on your test jobs (you can see this if you run nvtop as it shows a nice utilization curve), you should probably consider staging the data to local disk as a first step to increase utilization:

#!/bin/bash
#SBATCH --account my_project
#SBATCH -c 8
#SBATCH --mem 16g
#SBATCH --partition gpu-l40s
#SBATCH --gpus 1
#SBATCH --time 04:00:00

cp -r path-to-your-data-on-faststorage/ $TMPDIR/
# change paths to refer to $TMPDIR
some-command $TMPDIR/input.dat

Reading from local disk will provide much higher and more stable access to data. You can see the size of the local disk on each type of compute node on the hardware page.