Skip to main content


NCC Documentation


The NVIDIA CUDA Centre (NCC) GPU system is a shared computing facility provided by the Department of Computer Science. We primarily support staff and students in Computer Science, however we will consider requests for access from staff (and students with staff support) in other departments. Support requests should be sent to [email protected], and again we will prioritise support for Computer Science. NCC now serves dual purposes, first and foremost it is a research GPU cluster, but it now serves a secondary purpose as a teaching resource by offering Jupyter notebooks. If you are only using Jupyter, most of the information in this guide is irrelevant to you, so just skip down to the section on Jupyter notebooks and don’t concern yourself with the rest of the information provided here.

The system relies on the scheduling system called SLURM and all activity must be through SLURM. If you don’t use SLURM to run your code, your account will be blocked by an administrator. This guide explains how to use SLURM on NCC. Please note that this system does not come with large persistent storage and files should be transferred in when starting a job and then out after the job. The file store is not backed up. Some of the compute servers are currently not on UPS therefore we recommend that you have implement some kind of regular (e.g. hourly) checkpointing/resume on any long jobs (all head nodes and storage are on UPS).

We recently held a workshop on how to use NCC. The presentation slides are available here for reference (note that the slides are only visible on the university network).

System Information

The system is comprised of two head nodes, twelve GPU compute servers and six CPU blades. In total, we have 82 GPUs across the GPU compute servers and 246 CPU threads available on the CPU blades.

Requesting Access to the System

In order to connect to the system, you must have an account. If you do not already have an account, you can request one by email to [email protected]. When requesting an account, please provide:

  • your name
  • your CIS username
  • your email address
  • supervisor name
  • your status (e.g. L3 student, MSc taught student, MSc research student, PhD student, visitor, staff).

If you are a member of staff or student not in Computer Science, please provide rationale as to the nature of your research needs. Such students will also need support of their supervisor in their application, which can be provided by copying the supervisor into the account request email.


If you publish data from or research made possible through NCC, we would appreciate the following acknowledgement:

  • This work has used Durham University’s NCC cluster. NCC has been purchased through Durham University’s strategic investment funds, and is installed and maintained by the Department of Computer Science.

Jupyter Notebooks

We are hosting a Jupyterhub server on ncc1, which will be used for teaching in a variety of modules. If you are a student, your lecturers will inform you if they are using Jupyter notebooks, and will provide you with a URL for that module if appropriate. In order to access the server you either need to be on the University network or using the VPN (see connection details below). Students should request VPN access and provide the details of the module and lecturer. You only need to do this once, not for every module that is using Jupyter. Note that the VPN is secured with Multi Factor Authentication. Alternatively you can access the server via a remote browser launched from AppsAnywhere.

If researchers are interested in using Jupyter notebooks for research work that is unrelated to teaching there is a test server currently running at which will work with Python 3.6, C, Haskell and R. This server allows for the use of GPU backed notebooks and provides a similar level of compute to Google Colab.

It is now possible to configure your own Python environment for use with Jupyter notebooks on NCC. There is a script available in the assignments tab of the Jupyterhub landing page. Click “Assignments” and “Fetch” the released assignment. This adds a new folder to your files tab called “Create Jupyter Environment Script”, and inside that folder is a Jupyter notebook that can create your new environment and make it available within a Jupyter notebook. Follow the instructions in the provided setup script carefully, and only edit the variables that you are told to. If you are intending to use PyTorch on a GPU notebook you should ensure that you don’t request any CUDA modules in the environment creation script. PyTorch is shipped with its own CUDA libraries and loading an additional CUDA module will cause conflicts.

Connecting to the System

The system is configured with Ubuntu Linux 18.04 and is accessible via SSH through two head nodes:

  • our new head node;
  • our old head node, restricted to research students and staff.


To connect onto the system from a Linux machine, open a terminal and type: ssh [email protected]


On a Windows 10 machine, you can connect to NCC using the same method as on Linux using the Windows Subsystem for Linux (WSL). This is the best method to access NCC from Windows. Have a look to this link to install WSL. See this link for an introduction to WSL. Once you have a Linux-like terminal running on Windows using WSL, you can connect to NCC using: ssh [email protected]

On other versions of Windows, you can access the system using Putty. From Putty GUI, connect to and then type your username and password when prompted. On Windows 10, we recommand WSL instead of Putty because it is a lot easier to transfer files and create new terminal with WSL than with Putty.

Outside the University

NCC is not directly accessible from outside the university for security reasons. You can either ssh to the university gateway called or use the VPN. You have to request access to both of these services via forms on the CIS webpages, and both are secured with Multi Factor Authentication.
See for information on Mira and for information on the VPN. The direct link to the form to request VPN access is here.

In practice, it seems that Mira is actually faster than the VPN if you have a fast fibre broadband connection. For general usage though we recommend the VPN, as this has the advantage mentioned below of being able to copy files directly using scp.

From a Linux client or Windows WSL, it is possible to transparently connect to NCC using Mira as a proxy. In order to do that, add (or create) the following to your .ssh/config:

Host *
ProxyCommand ssh nc %h %p

then connect to NCC as if from inside the university. You will be prompted twice for the password (once for Mira, once for NCC). This is advantegeous because it means that you can copy files using scp directly to/from NCC without having to copy them first to Mira.

If you have never used Linux before

The system is based on Ubuntu Linux 20.04 and does not have any graphical interface therefore you must get used to the Linux command line in order to use the system. You can get familiar with Linux terminal using one the many tutorials available on Internet (for instance, this one).

Getting your Data on the System

NCC is available through SSH at Therefore you can copy your programs and input data to and from the system using scp, rsync or sshfs. While the command line is the recommended way of doing it, you can also use a GUI tool such as FileZilla which is available both on Windows and Linux. For your code, GIT (using BitBucket or Github for instance) is the best way to keep your code synchronized.

NCC is not a storage system — and is not backed-up. You must not leave large amount of unused data for extended amount of time on NCC. Please copy out of NCC your results and delete any temporary files or job output as soon as possible from the system.

Please note that apart from a few exceptions, your NCC account and data will be deleted once your CIS account cease to exist therefore make sure that all important data has been copied out of the system if you are leaving the university or graduating.


Your storage space on NCC is limited by a quota. The default quota is 100Gb if your account is on the old storage node (/home2). Accounts created after July 2023 will have a default quota of 250GB and reside on our new storage node (/home3). Your account can be moved from /home2 to /home3 on request, but this will break any python virtual environments that you have created and you will need to rebuild them.

You can check your quota and current utilization using the command quota. If you are using the old storage, this command will not work and you must use df -h $HOME.

If you need to use more than your current quota, please send a request by email to [email protected].

Running Jobs

Once you have copied your data and code on NCC, half of the work is done to get you started on NCC.

NCC uses SLURM to schedule jobs on the system which is also used on the university-wide Hamilton HPC. The role of the SLURM scheduling system is to divide the cluster resources fairly between all the jobs submitted by the users. Occasionally, this means that your job might stay a little while in a queue while the system is busy processing other jobs. Once SLURM allocates resources to your job, the job is guaranteed to have full usage of the resources until it terminates.

To use SLURM two major components need to be dealt with:

  • the batch scripting language which tells SLURM how to run your job;
  • the command line tools to start and control your job.


Batch Script

When scheduling a job/simulations/render through SLURM a script needs to be written which the job scheduler parses for instructions to itself and then passes the remaining instructions to the command line for execution once the job starts. SLURM batch scripts can be written in any terminal scripting language. As ‘BASH’ is the default shell environment on NCC, this tutorial will use bash syntax.

Below is a sample SLURM Job Script. The comments at the end of the line explain what each statement does.

# This line is required to inform the Linux
#command line to parse the script using
#the bash shell

# Instructing SLURM to locate and assign
#X number of nodes with Y number of
#cores in each node.
# X,Y are integers. Refer to table for
#various combinations

# Governs the run time limit and
# resource limit for the job. Please pick values
# from the partition and QOS tables below
#for various combinations
#SBATCH -p "partitionname"
#SBATCH --qos="qosname"

# Source the bash profile (required to use the module command)
source /etc/profile

# Run your program (replace this with your program)

The lines from the script above are required to run any job. The first line creates the user environment for the job. The next two are used to schedule and allocate resources to the job. The next two lines control the resource limits applied to a job. The partition indicates the set of nodes that the job can run on. Each partition has restriction on the amount of resources that can be used by a job. Pick the partition that suits your needs. The QOS specify how long the job can run at most. Longer QOS have stricter restrictions on the number of jobs than can be run at once. Please refer to the next two tables for a descriptions of all the partitions and QOS availables.

The final line is the maximum walltime that the job can run for. The default walltime is 30 minutes (regardless of QOS and partitions). #SBATCH -t HH:MM:SS or #SBATCH -t DD-HH

The -N-c specifies the resources needed by the job. Once your job is running, those resources will be allocated exclusively to your script therefore specify the exact amount needed. The default amount of memory allocated to a job is 4Gb per allocated CPU core. To allocate more (or less) memory, use: #SBATCH --mem=Xg

When the system is busy, it is in your interest to request as little resources as possible. Smaller jobs will start much more quickly than bigger jobs. The current scheduler greedily start jobs as soon as resources are available for them therefore large jobs may never start even when they are the oldest job in the queue.

Please request the exact amount of resources required by your job. It is not okay to request an entire node for a single-threaded program.

Available Partitions

The cluster is split into several overlapping partitions of nodes and you should use the most appropriate one for your job. There are three partitions. For cpu only jobs, use the cpu partitions which will give you access to a pool of 134 CPU cores. For gpu jobs, use either the gpu-small or gpu-large partitions. The gpu-large has a restricted set of nodes to exclude nodes which are not able to accommodate large jobs.

The limits are purposefully set to maximise the number of jobs that can be scheduled at once. If you require more memory or CPU than available through the gpu-large partition, please talk to an NCC administrator to get an exemption.

PartitionDefaultsLimits per jobAvailable GPUsRestricted to
tpg-gpu-smallcpu=1,mem=2G/cpugpu≥1,cpu≤4,mem≤28G52taught PG
res-gpu-smallcpu=1,mem=2G/cpugpu≥1,cpu≤4,mem≤28G64***research PG & staff
res-gpu-large**cpu=1,mem=2G/cpugpu≥1,cpu≤16,mem≤28G4research PG & staff

* On the CPU partition, a job can span multiple nodes however there is a limit of 32 cores and 60G memory per node.

** mem refers to system RAM, and not GPU VRAM. So the partitions res-gpu-large and gpu-bigmem just give access to additional CPU cores and system RAM, not larger GPUs.

*** Only 64 of the 82 GPUs we have are directly available for slurm jobs. There are 18 80GB A100 cards that have been virtualised into 126 10GB cards for use with Jupyter. You can learn more about the technology behind this here.

Available QOS

You must also choose a QOS that defines how many jobs you can ran simultaneously and for how long.

In your job script, you must specify one of the following QOS:

QOSMax jobs per userDefault WalltimeMax walltimeComment
debug30 minutes2 hours
short30 minutes2 days
long-high-prio430 minutes7 days
long-low-prio530 minutes7 daysJob might be preempted
long-cpu30 minutes14 dayscpu partition only

In addition to the QOS cap on the number of jobs, there is a cap of 4 simultaneously used GPUs for each user across all their job and 1 GPU per job. This cap might be raised or lowered without notice based on overall activity on the cluster. If you need to use multiple GPUs in the same job, please read the paragraph “Unusual requirements” below.

Please note that the limits and maximum defined in the QOS and partition tables are the maximum values that you can set however they are not defaults. Even in the long-high-prio partition, your job will be killed after 30 minutes unless you have explictly stated a larger walltime in your script (using -t). Similarly, in the gpu-large partition, your job will be killed if it attempts to use more than 2Gb of RAM unless you have explictly asked for a higher memory limit (using --mem).

With the move to SLURM, we are experimenting with preemption. Please read the preemption information if you are using the long-low-prio QOS to avoid data loss.

A note on memory and swap

Each NCC node has a swap partition. The swap file is used by the system to store some of your program memory if you run above the allowed amount of RAM (similarly to Windows’ paging system). This process is entirely transparent to you and the amount of swap that you use is not restricted — however it is not guaranted either.

Therefore your program might be allowed to continue running even if it uses more memory than the amount of RAM requested (using --mem). This is particularly useful if your program needs more memory than available on NCC. In practice, there are currently some shortcomings because disk read/write buffers are counting toward the maximum amount of RAM and can cause your program to run over the RAM limit and be killed therefore swapping is not guaranted to work well while reading/writing files. Those shortcomings will be addressed in a future upgrade of NCC.

Unusual requirements

If your job does not fit into any of the partitions/QOS above or you need to use multiple GPUs, please contact [email protected] outlining your requirements to find an appropriate solution. We will try to accomodate any reasonable non-frivolous requests within the capacity of the system.

Please consider alternative solutions before contacting us. Out of memory is often caused by bugs and design problems rather than real needs. Multi-GPU is never needed to increase the batch size (because gradients can be accumulated over multiple iterations, e.g. Caffe iter_size parameter). Besides writing code that scales well with multiple GPUs is hard. If a task will last more than 2 days, consider splitting it into multiple sequential jobs; you even submit a new job using sbatch from inside the batch script of another job.

GPU Jobs

Please skip this section, if you do not want to use GPUs.

You can request a single GPU using: #SBATCH --gres=gpu

Compile your code with both compute capability 6.1 (Pascal) and 7.5 (Turing) — or one of the two if you only intend to work on a single architecture. Your code must have been linked with CUDA 8.0 or above. Some programs such as Caffe need the GPU number. On NCC, the GPU allocated to your jobs are always numbered from 0. If you asked for one GPU, it will always be GPU 0. If you asked for two GPUs, they will always be 0 and 1. And so on.

NCC provides CUDA libraries using the module system. You can see a list of available modules with the command module available. To load a specific version of CUDA:

module load cuda/8.0

Note that some libraries such as PyTorch are bundled with their own CUDA libraries and you do not need to load any module.

Here is a sample job script allocating one GPU on one node based on CUDA 8.0:

#SBATCH -c 1
#SBATCH --gres=gpu
#SBATCH -p gpu-small
#SBATCH --qos=debug
#SBATCH --job-name=HiThere

source /etc/profile
module load cuda/8.0


Specific GPUs can be requested by type. The types available are currently pascal, turing, ampere and 1g.10gb and you can specify these in your batch file as follows:

#SBATCH --gres=gpu:ampere:1
Type GPUQuantity Available
pascalTitan X or Titan XP2 x Titan X, 14 x Titan XP
turing2080 Ti or Titan RTX or Quadro RTX 800024 x 2080 Ti, 8 x Titan RTX, 1 x Quadro RTX 8000
ampereRTX A6000 or 80GB A100 (PCIe) 3 x RTX A6000, 12 x 80GB A100 (PCIe)
1g.10gbVirtual 10GB GPU126 x Virtual 10GB GPU (each equivalent to 1/7 compute performance of an 80GB A100)

Other Optional Definitions

There are a few other batch instructions a user can input to better utilise the resources and to stream line their work.

#SBATCH -e stderr-filename
#SBATCH -o stdout-filename

The second options above allow a user to specify a special file name for the standard error (-e) and for the standard output (-o). These can be set to appear as a more useful name but it is not recommended as on each run it will overwrite the previous logs unless the file name is changed. By default, unless -e is specified, SLURM will write the stderr to the same file as stdout.

#SBATCH --job-name=jobnamehere Assign a name to the job. Helps locate it in the queue or on email notification. Cannot start with a number or contain spaces. After fully adding the headers normal bash commands can be used to define the steps the scheduler needs to take. (see example script at the bottom of this page).

#SBATCH --mail-type=[BEGIN, END, FAIL, REQUEUE, or ALL] Defines on which events the server should send an email notification.

#SBATCH --mail-user [email protected] Assign an email address to send updates to.

Tools: Starting and controlling your jobs

Once the job script is prepared it has to be passed on to the job scheduler. There are various command line tools, which allow the user to submit, delete and check the status of jobs and queues.

[username@ncc ~]$ sbatch jobscript

The sbatch command submits a batch file to the SLURM job queue. sbatch takes as an argument the job scripts as well as any flags that can be used inside a script. (Please see Interactive Jobs for more information).

[username@ncc ~]$ squeue

The squeue command shows the status of all the queues running on the system.

[username@ncc ~]$ scancel jobid

To delete a job from the system the scancel command can be used. Note only jobs submitted by you can be deleted using the scancel command. If you feel a job is stuck and not responding to the scancel command please contact an administrator.

[username@ncc ~]$ sinfo

The sinfo commands can be used on the head nodes of the clusters to see which nodes are down,available or locked with jobs.

Sample Job Script

This job script below requests one core of one node from SLURM. The job is called HiThere. The job runs in the debug queue. The standard error and output are combined in one file. The program then changes to the working directory and runs the helloworld program.

#SBATCH -c 1
#SBATCH -p debug
#SBATCH --job-name=HiThere


Checking your Jobs

GPU Usage

Wether you want to use one or two GPUs. It pays to double-check that your code is actually using the resources. To do that, do the following:

  • Find which machine your job is running on using sinfo;
  • Run nvidia-smi on the head node. The right column list all processes and the associated user. Check that you can see your process in this row. If it not there, then you might have misconfigured your job.

CPU and RAM Usage

You can query for statistics about a currently running job using: sstat <jobid>.batch|less -S

You can query for statistics about all finished jobs using: sacct -l |less -S

The peak RAM usage is shown in the column MaxRSS and CPU usage is availale bin the column AveCPU.

Usage Graph

Some general graphs are available on our Ganglia webserver.


NCC has a module system to load additionnal software. A full list of the module available on NCC can be shown using:

[username@ncc ~]$ module avail

Standard Output

By default, SLURM will write the output of your job to a file called slurm-<yourjobid>.out. As the job output is written to a file rather than interactively, most programs (e.g. Python, C/C++ standard library) will buffer and write the standard output by blocks of 4kb. This can be very inconvenient if you are trying to debug your program by regularly inspecting the output file (e.g. using tail -f).

To avoid this problem, you can force your program to use line-buffering using stdbuf (modify your batch script as follows): stdbuf -oL ./yourprogram


To use Matlab, in your job script, load the desired version using either module load matlab/2015b or module load matlab/2017a.


We provide both python and python3. If you require a specific package that is not installed on NCC and is available through the standard Ubuntu repository, speak to an NCC admin to install it.

If you would like to use a package that is not available on Ubuntu repository or you would like to use a more recent version of the package, you can create your own Python virtual environment using virtualenv (see this very good tutorial. You can also use Anaconda but beware that Anaconda tends to have a restricted and out-of-date set of packages and does not integrate well with program installed outside of its environment.


OpenCV 3.1 is available has a module which you can load using module load opencv/3.1. This comes with Python 2.7 and CUDA 7.5 support.

Do not use this version of OpenCV with CUDA 8.0 as this will cause your program to crash with mysterious errors because this version of OpenCV was built specifically against CUDA 7.5. Especially even common OpenCV operations might actually use CUDA internally resulting in various incompatibilities if your program uses CUDA 8.0.

As an alternative, you can also use the version of OpenCV packaged in Anaconda. CUDA support was not enabled in the Anaconda version therefore it should work fine regardless of the CUDA version that you might be using somewhere else in your program.

SLURM State Codes and Pending Reasons

Your job typically passes through several states throughout its execution. We outline below the most common status codes visible in the output of squeue:

  • PD: Pending. Your job is awaiting resource allocation. The reason is display in the column on the right. Check the list of common reasons below.
  • R: Running. Be patient!
  • CG: Completing. Your job just finished.
  • CA: Cancelled. The job has been cancelled.

Common reasons for pending jobs are:

  • Resources: There are no available resources for your job yet. Just wait and your job should start as soon as resources are available.
  • Priority: Another job with a higher priority than this job is also waiting. Your job will start once resources are available and all higher priority jobs have started. The job priority is used to assign resources as fairly as possible based on past cluster usage.
  • AssocMaxGRESPerJob: You are trying to use more than one GPU. Please get in touch with an administrator if you need multi-GPU.
  • AssocGrpGRES: You have reached the limit on how many GPUs you can use at once. Just wait and your job should start once one of your running jobs finishes.
  • QOSMaxMemoryPerJob: You are trying to use more memory than allowed. Try the gpu-large partition; if you need even more memory, speak to an admin.
  • QOSMaxCpuPerJobLimit: You are trying to use more CPUs than allowed. Try the gpu-large partition; if you need even more CPUs, speak to an admin.
  • QOSMaxWallDurationPerJobLimit: You are trying to run a job for longer than allowed. Try the QOS long-high-prio; if 7 days is not enough, speak to an admin.
  • QOSNotAllowed: You are not allowed to use this QOS or you are trying to the use the long-cpu QOS (reserved for the cpu partition) on one of the GPU partitions.
  • ReqNodeNotAvail: You have explicitly asked for a specific node using –nodelist however this node is currently down (usually for maintenance).
  • QOSMinGRES: You have submitted a job to a GPU partition but haven’t requested any GPU. Please look at the above job script examples.
  • InvalidAccount: There is an issue with your account, speak to an admin.

A longer list of possible reasons is available here.


How can I change my shell to ZSH?

Use the command chsh.ldap youruserid -s /bin/zsh then relogin to NCC after 15 minutes.

How can multiple users share/work together on the same data?

We can set up a new user group and directory under /projects with the right permissions for you. Please contact an admin for that.

Can you install PyTorch on NCC?

No, like TensorFlow, PyTorch moves too quickly to be installed globally in an HPC environment. Install your own version using virtualenv or Anaconda.

TensorFlow uses a lot of threads, should I book an entire node?

No, despite using a lot of threads, the actual CPU usage of TensorFlow is usually much lower. Try to empirically estimate the average CPU usage your code, round it up and request that number.

My job has been queued for a day and still not started!

NCC might be unusually busy. Double-check that you have not requested more resources than NCC can provide because SLURM does not always warn you if your request is impossible to fullfill. Especially absurd requests for a million hours of walltime, 100 CPUs or GPUs, 1000GB will be silently accepted by SLURM but never actually scheduled. If you requested a large number of CPU or GPU resources. Consider reducing the number. SLURM take into account the cost of running each job and will prioritize small jobs over larger ones and it is highly unlikely that 16 CPUs will be scheduled any time soon if there is a steady queue of small jobs. If you really need that many resources and NCC is busy, please contact an admin.

Can I interactively debug my program on NCC (e.g. using gdb)?

Yes however please consider doing it on your own development machine first. If you don’t have the resources to do so except on NCC then you can start an interactive job using: srun -N X -c Y --gres=gpu:Z --pty /bin/bash This will allocate the desired resources and start an interactive shell inside a job. Then you can proceed debugging as usual. Please close this shell (typing exit or pressing Ctrl+D) as soon as you don’t need the shell anymore to avoid wasting resources.

Can you install program X?

Yes if the program is available through Ubuntu package manager (see the list here). For Python packages only available through pip, please use virtualenv or Anaconda. For other software, we might consider installing it if it is of general use (e.g. Matlab) or too difficult to install in your home folder.

My code using OpenCV crashes on NCC!

If you are using CUDA, make sure that you have loaded the OpenCV module matching the version of CUDA that you would like to use. Speak to an admin if you can’t find it.

I use x (where x is linux-based) and it’s too difficult to port my code to NCC environment! What can I do?

We have an experimental deployment of Docker. You can have your own Linux distribution with root permissions inside a container on NCC. Speak to an NCC admin if you want to try it out.