For most use cases, we recommend that you install and use the Conda package manager to install software on GenomeDK. Conda provides access to thousands of software packages and is easy to get started with.

For more advanced use cases, or where there’s a substantial need for reproducibility, we recommend Apptainer, which is also supported on GenomeDK.

You can of course also compile software yourself, but you must provide all of the necessary dependencies (compilers, libraries) for the build, e.g. using Conda.

Software installation with Conda¶

Conda can install any kind of software. This means that your entire setup can be installed through Conda (if there’s packages for it all). For example, you can create an environment with Rstudio, R, and ggplot2 with a single command.

Conda provides access to thousands of packages used in data science and bioinformatics. These packages can be installed with a single command, so you don’t have to worry about compilers, dependencies, and where to put binaries.

Installing Conda¶

Downloading and installing Conda is very simple, you just download and run the installer:

[fe-open-01]$ wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O miniforge.sh
[fe-open-01]$ chmod +x miniforge.sh
[fe-open-01]$ bash miniforge.sh -b
[fe-open-01]$ ./miniforge3/bin/conda init bash

That’s it! The last two step makes sure that Conda will be available when you log in, so now is a good time to open a new connection and check that Conda is available.

Now let’s configure Conda to make it super useful.

Configuring Conda¶

Conda can install packages from different channels. This is similar to repositories in other package managers. Here we’ll add a few channels that are commonly used in bioinformatics:

[fe-open-01]$ conda config --append channels bioconda
[fe-open-01]$ conda config --append channels genomedk
[fe-open-01]$ conda config --set channel_priority strict

Conda creates a base environment which contains Conda itself. It’s tempting to install packages in base, but that might ruin your Conda installation. You should never install anything in the base environment.

To prevent that you accidentially install something in the base environment, we’ll configure Conda so that it doesn’t activate it when you log in:

[fe-open-01]$ conda config --set auto_activate_base false

Once you have done these steps, you should have a config file in your home folder called .condarc that looks like this:

[fe-open-01]$ cat $HOME/.condarc
channels:
  - conda-forge
  - bioconda
  - genomedk
channel_priority: strict
auto_activate_base: false

Finding Conda packages¶

You can easily search for Conda packages through the website anaconda.org or using the conda search command:

[fe-open-01]$ conda search samtools

Remember that the Conda package may not be called the exact official name of the software. For example, the Conda package for the software biobambam2 is just called biobambam, so searching for biobambam2 would not return any results.

If you can’t find a suitable Conda package, contact us and we will build a Conda package for you (when possible). Sometimes building a Conda package is not viable and in that case we will build a Singularity/Apptainer image instead.

Installing Conda packages¶

Here is how the usage might look if we want to create a new environment with the newest version of PySAM:

[fe-open-01]$ conda create -n amazing-project pysam

This gives us a clean environment with just the minimal number of packages necessary to support PySAM. To use the software that was installed in the environment, the environment needs to be activated first:

[fe-open-01]$ conda activate amazing-project
(amazing-project) [fe-open-01]$ python -c 'import pysam; print(pysam.__version__)'
0.6.0

Notice that the prompt changed to show you that you’re now in the amazing-project environment.

You can install further packages in the environment with:

(amazing-project) [fe-open-01]$ conda install r-ggplot2

Since Conda knows about the entire environment you created, it can tell you exactly which packages are used in the environment. This is very useful for collaborating with others, since your collaborators can create an exact copy of your environment with a single command.

To export your environment so that others can recreate it:

(amazing-project) [fe-open-01]$ conda env export > environment.yml

The environment.yml file contains an exact specification of your environment and the packages installed. You can put this in your shared project folder. Others will then be able to recreate your environment by running:

[fe-open-01]$ conda env create -f environment.yml

Containers with Apptainer/Singularity¶

Apptainer is a container technology for HPC that used to be called “Singularity”. If you’re familiar with Docker, Apptainer will seem familiar and Apptainer can convert most Docker images to its own (SIF) format and run them without issues.

Finding Apptainer images¶

There’s a multitude of repositories for Docker/Apptainer images:

Pull an image¶

Apptainer is already installed and configured on GenomeDK, and you should be able to pull and run containers without any further setup.

[fe-open-01] apptainer pull docker://biocontainers/blast:2.2.31

This will pull the Docker image for BLAST and convert it to SIF, so it may take a while. In this case, the image will be put in your current working directory as blast_2.2.31.sif.

Be aware that you should pull and convert images once before submitting jobs. That is, never put apptainer pull in a job script.

The images are quite large, so consider putting them in a relevant project folder.

Run a container¶

You can now run a command inside the image:

[fe-open-01] apptainer run blast_2.2.31.sif blastp -version
blastp: 2.2.31+
Package: blast 2.2.31, build Apr 23 2016 15:49:47

You can of course do this in job scripts also.

Apptainer supports the use of GPUs in containers, for example:

[fe-open-01] apptainer pull docker://nvcr.io/nvidia/tensorflow:23.08-tf2-py3

Then, on a GPU node (either in an interactive or batch job):

[gn-1001] apptainer run --nv tensorflow_23.08-tf2-py3.sif python3 mnist_classify.py

Note the use of the --nv flag.

Building software for CUDA¶

If you need to compile a piece of software that is supposed to use GPUs you most likely have to do it in a job on one of the GPU nodes, since headers required for compilation are only located there.

You can get a list of GPU nodes with:

[fe-open-01] sinfo -p gpu -N
NODELIST   NODES PARTITION STATE
gn-1001        1       gpu mix
gn-1002        1       gpu alloc

Headers and libraries for compilation are located in /usr/local/cuda/targets/x86_64-linux.

Read more about how to submit jobs for the GPU nodes here.

Using graphical interfaces¶

There’s two options for using programs with a graphical user interface on GenomeDK.

GenomeDK Desktop¶

The most convenient and reliable way to get a graphical interface on GenomeDK is through the GenomeDK Desktop. You can log in with your existing (open) user credentials and two-factor token.

The Desktop provides a full virtual desktop inside your browser and requires no software to be installed on your own machine. Once connected, you can access all of your projects as usual and launch graphical applications directly.

The desktop environment runs on the frontend, so all of the usual guidelines about not running computations on the frontend still apply. However, you can start an interactive job and launch a graphical application (e.g. Rstudio) inside the job.

Session persistence¶

Desktop sessions are persistent, meaning that you can log out of the Desktop and log in later and all of your applications, windows, etc. will still be available.

However, sessions time out after 72 hours of inactivity (not logging in or using the session). This kills all processes in the session. Unsaved files will be lost.

Clipboard¶

The Desktop runs directly in your browser and is thus limited by browser functionality and security measures. This is mostly noticeable in the way copy-paste is handled, as browsers do not allow direct access for the Desktop to manipulate your clipboard.

To paste text from your local computer into the Desktop:

copy the text as usual on your local computer,
go to the Desktop and click “Show clipboard” in the top menu,
paste the text into the text area and click “Hide clipboard”,
inside the Desktop session, focus the application you wish to paste into, then right-click and select “Paste”.

To copy text from inside the Desktop to your local computer:

inside the Desktop, select the text you wish to copy,
click “Show clipboard” in the top menu,
the text you selected should be present in the text area,
select the text, right click and select “Copy”,
you can now paste the text into any application on your local computer.

X-forwarding¶

You can use X-forwarding to tunnel individual graphical programs to your local desktop.

On Linux you simply need to tell SSH that you wish to enable X-forwarding. To do this, add -X to the ssh command when logging in to the cluster, for example:

[local]$ ssh -X USERNAME@login.genome.au.dk

You should then be able to open e.g. Firefox on the frontend:

[fe-open-01]$ firefox

Since macOS does not include an X server, you will need to download and install XQuartz on your computer. When installed, reboot the computer. Now, you just need to tell SSH that you wish to enable X-forwarding. To do this, add -X to the ssh command when logging in to the cluster, for example:

[local]$ ssh -X USERNAME@login.genome.au.dk

You should then be able to open e.g. Firefox on the frontend:

[fe-open-01]$ firefox

On Windows, we recommend that you use MobaXterm which has an integrated X server.

Using a terminal multiplexer¶

Using a terminal multiplexer allows you to keep your SSH session open, even when you disconnect from the cluster. You can even reconnect from a different computer and get your session back.

We recommend that you use either tmux or screen.

tmux
screen.

Installing and using software

Software installation with Conda¶

Installing Conda¶

Configuring Conda¶

Finding Conda packages¶

Installing Conda packages¶

Containers with Apptainer/Singularity¶

Finding Apptainer images¶

Pull an image¶

Run a container¶

Building software for CUDA¶

Using graphical interfaces¶

GenomeDK Desktop¶

Session persistence¶

Clipboard¶

X-forwarding¶

Using a terminal multiplexer¶