Installing and using software

We recommend that you install and use the Conda package manager to install software on GenomeDK. However, we used to have another mechanism for installing software that has now been deprecated. If you used this mechanism before, please read through the next section for instructions on how to transition safely to Conda. If you’re a new user you may skip the next section and jump directly to Installing the Conda package manager.

Why Conda? The clever thing about Conda is that it allows you to use separate environments for separate projects. If you have a project where you’ve installed a bunch packages for Python or R there is no reason for those to accidentally seep in to your next project. If you want to try different versions of some package you can just create separate environments for them instead of installing and uninstalling multiple times. With separate environments you force yourself to make the dependencies for each project explicit which in turn makes it easier for collaborators to run your code and improves reproducibility.

Conda also provides access to thousands of packages used in data science and bioinformatics. These packages can be installed with a single command, so you don’t have to worry about compilers, dependencies, and where to put binaries.

For old users only…

Previously, GenomeDK has made software available for users through a special mechanism called /com/extra which allowed users to load specific software packages. However, there are several problems with the approach taken here. If you are already using software from /com/extra, note that this may not be supported in the future and that no new software will be made available through this mechanism.

Also, note that software installed through the old mechanism may interfere with your environments. If you wish to use Conda we therefore encourage you to edit your .bashrc and .bash_profile files and remove all lines which loads software from /com/extra.

Additionally, you should ensure that none of the above files reference any system Python installation or related modules. It’s also a good idea to remove any reference to /com/extra/stable.

Installing the Conda package manager

Downloading and installing Conda is very simple, you just download and run the installer:

[fe1]$ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
[fe1]$ chmod +x Miniconda3-latest-Linux-x86_64.sh
[fe1]$ ./Miniconda3-latest-Linux-x86_64.sh -b
[fe1]$ conda init

That’s it! The last step makes sure that Conda will be available when you log in, so now is a good time to open a new connection and check that Conda is available.

Now let’s configure Conda to make it super useful.

Configuring Conda

Conda can install packages from different channels. This is similar to repositories in other package managers. Here we’ll add a few channels that are commonly used in bioinformatics:

[fe1]$ conda config --add channels defaults
[fe1]$ conda config --add channels bioconda
[fe1]$ conda config --add channels conda-forge
[fe1]$ conda config --add channels genomedk

Finally, to make Conda more predictable, we use strict channel priority:

[fe1]$ conda config --set channel_priority strict

Searching for packages

You can easily search for Conda packages through the website anaconda.org or using the conda search command:

[fe1]$ conda search rstudio

Remember that the Conda package may not be called the exact official name of the software. For example, the Conda package for the software biobambam2 is just called biobambam, so searching for biobambam2 would not return any results.

Using environments

When you just installed Conda, it comes with a single environment known as the base environment. To activate the base environment, just type:

[fe1]$ conda activate
(base) [fe1]$

You now have access to the software installed in the base environment.

Here is how the usage might look if we want to create a new environment with the newest version of PySAM:

[fe1]$ conda create --name amazing-project pysam
Solving environment: done

## Package Plan ##

  environment location: /Users/das/.conda/envs/amazing-project

  added / updated specs:
    - pysam
    - python=3


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    pysam-0.15.1               |   py36h0380709_0         2.0 MB  bioconda
    bcftools-1.9               |       h4da6232_0         789 KB  bioconda
    samtools-1.9               |       h8ee4bcc_1         526 KB  bioconda
    setuptools-40.4.3          |           py36_0         556 KB
    certifi-2018.10.15         |           py36_0         138 KB
    libcurl-7.61.1             |       hf30b1f0_0         457 KB
    libffi-3.2.1               |                1          41 KB  bioconda
    htslib-1.9                 |       hc238db4_4         1.2 MB  bioconda
    curl-7.61.1                |       ha441bb4_0         135 KB
    wheel-0.32.2               |           py36_0          35 KB
    libdeflate-1.0             |       h470a237_0          44 KB  bioconda
    bzip2-1.0.6                |       h1de35cc_5         149 KB
    ------------------------------------------------------------
                                           Total:         6.0 MB

The following NEW packages will be INSTALLED:

    bcftools:        1.9-h4da6232_0          bioconda
    bzip2:           1.0.6-h1de35cc_5
    ca-certificates: 2018.03.07-0
    certifi:         2018.10.15-py36_0
    curl:            7.61.1-ha441bb4_0
    htslib:          1.9-hc238db4_4          bioconda
    libcurl:         7.61.1-hf30b1f0_0
    libcxx:          4.0.1-hcfea43d_1
    libcxxabi:       4.0.1-hcfea43d_1
    libdeflate:      1.0-h470a237_0          bioconda
    libedit:         3.1.20170329-hb402a30_2
    libffi:          3.2.1-1                 bioconda
    libssh2:         1.8.0-h322a93b_4
    ncurses:         6.1-h0a44026_0
    openssl:         1.0.2p-h1de35cc_0
    pip:             10.0.1-py36_0
    pysam:           0.15.1-py36h0380709_0   bioconda
    python:          3.6.6-hc167b69_0
    readline:        7.0-h1de35cc_5
    samtools:        1.9-h8ee4bcc_1          bioconda
    setuptools:      40.4.3-py36_0
    sqlite:          3.25.2-ha441bb4_0
    tk:              8.6.8-ha441bb4_0
    wheel:           0.32.2-py36_0
    xz:              5.2.4-h1de35cc_4
    zlib:            1.2.11-hf3cbc9b_2

Proceed ([y]/n)? y


Downloading and Extracting Packages
pysam-0.15.1         | 2.0 MB    | ################################## | 100%
bcftools-1.9         | 789 KB    | ################################## | 100%
samtools-1.9         | 526 KB    | ################################## | 100%
setuptools-40.4.3    | 556 KB    | ################################## | 100%
certifi-2018.10.15   | 138 KB    | ################################## | 100%
libcurl-7.61.1       | 457 KB    | ################################## | 100%
libffi-3.2.1         | 41 KB     | ################################## | 100%
htslib-1.9           | 1.2 MB    | ################################## | 100%
curl-7.61.1          | 135 KB    | ################################## | 100%
wheel-0.32.2         | 35 KB     | ################################## | 100%
libdeflate-1.0       | 44 KB     | ################################## | 100%
bzip2-1.0.6          | 149 KB    | ################################## | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate amazing-project
#
# To deactivate an active environment, use
#
#     $ conda deactivate

This gives us a clean environment with just the minimal number of packages necessary to support PySAM. To use the software that was installed in the environment, the environment needs to be activated first:

[fe1]$ conda activate amazing-project
(amazing-project) [fe1]$ python -c 'import pysam; print(pysam.__version__)'
0.6.0

Notice that the prompt changed to show you that you’re now in the amazing-project environment.

Conda can install any kind of software. This means that your entire setup can be installed through Conda (if there’s packages for it all). For example, you can create an environment with Rstudio, R, and ggplot2 with a single command.

Command reference

To install software in the currenctly activated environment:

(amazing-project) [fe1]$ conda install PACKAGE-NAME

To remove a software package from the currently activated environment:

(amazing-project) [fe1]$ conda remove PACKAGE-NAME

To update a software package in the currently activated environment:

(amazing-project) [fe1]$ conda update PACKAGE-NAME

Since Conda knows about the entire environment you created, it can tell you exactly which packages are used in the environment. This is very useful for collaborating with others, since your collaborators can create an exact copy of your environment with a single command.

To export your environment so that others can recreate it:

(amazing-project) [fe1]$ conda env export > environment.yml

The environment.yml file contains an exact specification of your environment and the packages installed. You can put this in your shared project folder. Others will then be able to recreate your environment by running:

[fe1]$ conda env create -f environment.yml

You can read more about using environments for projects here. There’s also also a cheat sheet with Conda commands available.

I don’t think I can use Conda because…

A Conda package is not available

In this case you can contact us and we will build a Conda package for you (when possible). Sometimes building a Conda package is not viable and in that case we will build a Singularity image instead.

I’m part of a project that dictates the software I should use

In this case the project should and probably will supply you for either a set of Conda packages or Singularity images. If not, most or all of the software will probably be available through Conda anyway, so you can still set up an environment with the software.

Using graphical interfaces

There’s two options for using programs with a graphical user interface on GenomeDK.

X-forwarding

You can use X-forwarding to tunnel individual graphical programs to your local desktop. This works well for many programs, but programs that do fancy graphics or anything animated might not work well.

On Linux you simply need to tell SSH that you wish to enable X-forwarding. To do this, add -X to the ssh command when logging in to the cluster, for example:

[local]$ ssh -X USERNAME@login.genome.au.dk

You should then be able to open e.g. Firefox on the frontend:

[fe1]$ firefox

Since macOS does not include an X server, you will need to download and install XQuartz on your computer. When installed, reboot the computer. Now, you just need to tell SSH that you wish to enable X-forwarding. To do this, add -X to the ssh command when logging in to the cluster, for example:

[local]$ ssh -X USERNAME@login.genome.au.dk

You should then be able to open e.g. Firefox on the frontend:

[fe1]$ firefox

On Windows, we recommend that you use MobaXterm which has an integrated X server.

VNC

If you want to use a full virtual desktop you can use a VNC program. There are lots of options but we recommend TightVNC which works on both Linux, macOS, and Windows. When downloading TightVNC we recommend to get “TightVNC Java Viewer” from the download section. It downloads a ZIP archive which contains an executable JAR file.

To use VNC you first need to login to the frontend and start a VNC server. Starting the server is done with the vncserver command and looks like this:

The display id (:3 in this example) is needed when you want to connect the VNC client.

To connect to the running VNC server the SSH tunnel through the login node has to be established. In case of TightVNC, the tunneling option is included in the software itself and following settings should be sufficient:

../../_images/tightvnc.png

Note the “Port” field! The number specified must be 5900 plus the display ID, which in this example was :3. Thus, the port number becomes 5903.