A cluster consists of a large number of interconnected machines.
Each machine is very much like your own desktop computer, except it’s much more powerful. Like your desktop computer, each machine (often referred to as node) has a CPU, some memory, and a hard disk. The nodes running your programs are usually called compute nodes.
To turn the nodes in to a cluster we need three more ingredients: a network, data storage, and a queue manager. In section we’ll discuss these components in a bit more detail to give you an understanding of how a cluster works.
Nodes in the cluster connect to a high-performance network. The network allows programs running on different nodes to “talk” to each other. It also lets you log in to a node from any other node (with some restrictions).
Bioinformatics is data intensive so we can’t make do with a single hard disk as you would in your own computer. Instead we buy hundreds of hard disks and use a file system that can manage files across these disks. The file system presents the user with a unified file system.
We call our distributed file system fast storage because it’s blazingly fast. It’s up to you to your files on fast storage or on the slower shared file system.
With storage the cluster is now functional. But, any user would be able to log in to any node and run a program, which may consume the resources of the entire node. This could cause other users’ programs on the same node to crash. Likewise, one user can start thousands of runs of a program on different nodes, consuming the resources of the entire cluster for an unknown duration of time. In short, it would be complete anarchy!
To solve this problem we need a queue manager.
The queue manager is much like the queues in the supermarket. You stand in the queue and wait until it’s your turn to pay. On the cluster, you submit jobs to the queue and your jobs will run on some node chosen by the queue manager, once resources are available.
The queue manager also allows you to specify certain requirements for your program to run. For example, your program may need to run on a node with a lot of memory. If you specify this when submitting your job, the queue manager will make sure to run the job on a node with at least that amount of memory available.
At GenomeDK we use a queue manager called Slurm, so to run your programs on the cluster, you’ll be interacting a lot with Slurm.
This ends our tour of the cluster setup. Now, here’s a few things that you should keep in mind when using the cluster…
In the coming sections you will learn how to connect to the cluster so that you can start submitting jobs.