NCC is replacing the existing PBS solution with SLURM. While the two systems are very similar, the name of the commands and job specification are slightly different.

For instance, a PBS GPU job script such as:

#PBS -l nodes=1:ppn=1:gpus=1
#PBS -q debug
#PBS -N HiThere
#PBS -j oe

module load cuda/8.0


must be updated to SLURM format:

#SBATCH -c 1
#SBATCH --gres=gpu
#SBATCH -p gpu-small
#SBATCH --qos=debug
#SBATCH --job-name=HiThere

source /etc/profile
module load cuda/8.0


Note that cd $PBS_O_WORKDIR disappears in the SLURM script because SLURM will by default start your job script in your submission directory.

For almost all instructions, there is a one-to-one mapping between PBS and SLURM as defined in the Reference below.


In your job script, you must specify one of the following partitions:

PartitionDefaultsLimits per jobAvailable GPUsRestricted to
tpg-gpu-smallcpu=1,mem=2G/cpugpu≥1,cpu≤4,mem≤28G40taught PG
res-gpu-smallcpu=1,mem=2G/cpugpu≥1,cpu≤4,mem≤28G48research PG & staff
res-gpu-largecpu=1,mem=2G/cpugpu≥1,cpu≤16,mem≤28G4research PG & staff

* On the CPU partition, a job can span multiple nodes however there is a limit of 32 cores and 60G memory per node.


In your job script, you can specify the following QOS:

QOSMax jobs per userDefault WalltimeMax walltimeComment
debug30 minutes2 hours
short30 minutes2 days
long-high-prio130 minutes7 days
long-low-prio30 minutes7 daysJob might be preempted
long-cpu30 minutes14 dayscpu partition only

With the move to SLURM, we are experimenting with preemption. Please read the preemption information if you are using the long-low-prio QOS to avoid data loss.


Those instructions were heavily inspired by:

User commandsPBS/TorqueSLURM
Job submissionqsub <job script>sbatch <job script>
Job submissionqsub -q debug -l nodes=2:ppn=16 -l mem=64g <job script>sbatch -p debug -N 2 -c 16 –mem=64g <job script>
Job deletionqdel <job_id>scancel <job_id>
Job deletionqdel ALLscancel -u <user>
List jobsqstat [-u user]squeue [-u user] [-l for long format]
Job statusqstat -f <job_id>scontrol show job <job_id>
Job holdqhold <job_id>scontrol hold <job_id>
Job releaseqrls <job_id>scontrol release <job_id>
Node statuspbsnodes -lsinfo -N -l
Interactive jobqsub -I -l nodes=1:ppn=1 /bin/bashsrun -N 1 -c 1 –pty /bin/bash
X GUIsview
Node list (entry per core)$PBS_NODEFILE$PBS_NODEFILE (still supported)
Slurm node list$SLURM_JOB_NODELIST (new format)
Job SpecificationPBS/TorqueSLURM
Script directive#PBS#SBATCH
Queue-q <queue>-p <partition>
Node count-l nodes=<count>-N <min[-max]>
Cores(cpu) per node-l ppn=<count>-c <count>
Memory size-l mem=16384–mem=16g OR –mem-per-cpu=2g
Wall clock limit-l walltime=<hh:mm:ss>-t <days-hh:mm:ss>
GPU count-l gpus=X–gres=gpu:X
Pascal GPU count-l gpus=X:PASCAL–gres=gpu:pascal:X
Kepler GPU count-l gpus=X:KEPLER–gres=gpu:kepler:X
Standard output file-o <file_name>-o <file_name>
Standard error file-e <file_name>-e <file_name>
Combine stdout/err-j oe(use -o without -e) [standard behaviour]
Direct output to directory-o <directory>-o “directory/slurm-%j.out”
Event notification-m abe–mail-type=[BEGIN, END, FAIL, REQUEUE, or ALL]
Email address-M <address>–mail-user=<address>
Job name-N <name>–job-name=<name>
Node sharingonly for same userfor all users if not –exclusive
Node sharing–exclusive OR –shared
Job dependency-W depend=afterok:<jobid>–depend=C:<jobid>
Node preference–nodelist=<nodes> AND/OR –exclude=<nodes>