Converting from PBS to Slurm

In March 2018, HPC will be implementing the Simple Linux Utility for Resource Management (Slurm) for job scheduling and resource management, replacing our current resource management/job scheduling utility, Adaptive Computing’s PBS Torque/Moab. Slurm is an open-source utility that is widely used at national research centers, higher education research centers, government institutions, and other research institutions across the globe.

How is Slurm different from PBS?

Slurm is different from PBS in a number of important ways, including the commands used to submit and monitor jobs, the syntax used to request resources, and the way environment variables behave.

ATTENTION: We recommend that you submit sbatch Slurm jobs with the #SBATCH –export=NONE option to establish a clean environment, otherwise Slurm will propagate current environmental variables to the job. This could impact the behavior of the job, particularly for MPI jobs.

Some specific ways in which Slurm is different from PBS include:

  • Slurm will not allow a job to be submitted if there are insufficient account cycles or no resources matching your request. PBS would queue the job, but it would never run.
  • Slurm leverages the kernel’s control groups (cgroups) to limit memory and CPU usage for each job and shares those limits across all ssh sessions for the owner of the job. This impacts anyone accessing a compute node running their job. Instead of getting new system resources, you share the same limits as your running job.
  • What PBS called queues, Slurm calls partitions.
  • Resources are assigned per “task”/process.
  • Environmental variables of the submitting process are passed to the job.

For full details on how to work with Slurm, see our documentation on running a job with Slurm

PBS-to-Slurm Command Cheat Sheets

The several charts below lists the PBS commands you currently use to process jobs, their Slurm equivalents, and the meaning of the new commands. These cheat sheets are intended to help you get up and running with Slurm quickly.

Cheat Sheet: General Commands

PBS Command Slurm Command Meaning
qsub <job-file> sbatch <job-file> Submit <job script> to the queue
qsub -I salloc <options> Request interactive job
showstart squeue –start Show estimated start time
qstat <-u username> squeue <-u username> Check jobs for a particular user in the scheduling queue
qstat <queue> squeue -p <partition> Display queue/partition entries
qstat -f <job_id> scontrol show job <job_id> Show job details
qdel <job_id> scancel <job_id> Delete <job_id>

Cheat Sheet: How to Submit Jobs

Submitting jobs in Slurm is straightforward. You just replace ‘qsub’ with one of the commands from the table below.

PBS Command Slurm Command Meaning
qsub job.pbs sbatch job.slurm Submit batch job to queue
qsub -I salloc <options> Submit interactive job to queue. See resources below for options

Cheat Sheet: How to Request Resources

The way that you work with resources in Slurm is similar to PBS but there are some significant differences, including the syntax used to submit requests. See the table below for some of the options most frequently used to request resources.

PBS Slurm Meaning
qsub sbatch/salloc Submit batch/interactive job script to queue*
-l procs=<number> –ntasks=<number> # of processes to run
-l nodes=X:ppn=Y –ntasks=<multiply X*Y> # of processes to run
-l walltime=<HH:MM:SS> –time=<HH:MM:SS> How long the job will run
-l mem=<number> –mem=<number> Total memory (single node)
-l pmem=<number> –mem-per-cpu=<number> Memory per cpu
-l …:<attribute> –constraint=<attribute> Node property to request (avx, IB)
-q <queue_name> –partition=<partition_name> Which set of nodes to run job on

NOTE: When using salloc to submit interactive jobs, you need to use the following string

salloc <options>

Aside from the new syntax, Slurm places emphasis on the programs you want to run. You specify the needs of your program and Slurm will find the appropriate resources for it to run on. For example, most programs have memory, walltime, and CPU requirements.

If you want to run a single program (which Slurm calls a “task”) that needs 16 gigabytes of memory, 8 CPUs, and takes 12 hours to complete, you might use the following example job script:

#SBATCH --ntasks=8
#SBATCH --mem-per-cpu=2GB 
#SBATCH --time=12:00:00

Cheat Sheet: Useful SBATCH Options

You may find it useful to have email alerts, or have slurm help organize your log files by implementing one of these sbatch options.

PBS Slurm Meaning
-l M <email> –mail-user=<email> Where to send email alerts
-l m <a|b|e> –mail-type=<BEGIN|END|FAIL|REQUEUE|ALL> When to send email alerts
-o <out_file> –output=<out_file> Name of output file
-e <error_file> –error=<error_file> File name if segregated error log is desired
-N <job_name> –job-name=<job_name> Job name

*For more details on submitting a batch script to Slurm, see slurm.schedmd.com/sbatch.html.

Cheat Sheet: How to Read/Set Slurm Environment Variables

Like PBS, Slurm sets its own environment variables within your job

PBS Slurm Meaning
$PBS_JOBID $SLURM_JOB_ID Job ID
$PBS_O_WORKDIR $SLURM_SUBMIT_DIR Directory job was submitted from
$PBS_NODEFILE $SLURM_JOB_NODELIST File containing allocated hostnames
$PBS_O_HOST $SLURM_SUBMIT_HOST Hostname job was submitted from

In Slurm, environment variables will get passed to your job by default. If you have certain environment variables set that you think might interfere with your job you can either:

  • Log out then log back in and submit your job
  • Run sbatch with these options to override the default behavior:
     sbatch --export=None 
     sbatch --export TEST=3 
     sbatch --export=ALL,TEST=3
    

Cheat Sheet: How to Monitor Your Job(s)

After submitting your job you can check its status with these new commands:

PBS Slurm Meaning
qstat <-u username> squeue <-u username> Check jobs for a specific user
qstat -f <jobid> sstat <job_id> Displays resource usage
pbsnodes <:attribute> sinfo Display all nodes (with attribute)
checkjob <job_id> scontrol show job <job_id> Status of particular job
showstart <job_id> squeue -j <job_id> –start Get estimated start time of job
qdel <job_id> scancel <job_id> Delete job

Valid Job States

Below are the job states you may encounter when monitoring your job(s) in Slurm.

Code State Meaning
CA Canceled Job was canceled
CD Completed Job completed
CF Configuring Job resources being configured
CG Completing Job is completing
F Failed Job terminated with non-zero exit code
NF Node Fail Job terminated due to failure of node(s)
PD Pending Job is waiting for compute node(s)
R Running Job is running on compute node(s)
TO Timeout Job terminated upon reaching its time limit

 

Sample submit.sh

#!/bin/bash
#
#SBATCH –job-name=wnguyen
#SBATCH -N 1                                                                                  # number of nodes
#SBATCH -n 1                                                                                  # number of cores
#SBATCH –mem 100                                                                     # memory pool for all cores
#SBATCH -t 0-2:00 # time (D-HH:MM)
#SBATCH –ntasks=1
#SBATCH –time=10:00
#SBATCH –nodelist=node[19-21]                                                #CoE nodes with GPU
#SBATCH –mail-user=wnguyen@sdsu.edu
#SBATCH –mail-type=begin
#SBATCH –mail-type=end
#SBATCH –error=wnguyen.%J.err                                              #File name if segregated error log is desired
#SBATCH –output=wnguyen.%J.out                                          #Name of output file

echo “SLURM_JOBID=”$SLURM_JOBID
echo “SLURM_JOB_NODELIST”=$SLURM_JOB_NODELIST
echo “SLURM_NNODES”=$SLURM_NNODES
echo “SLURMTMPDIR=”$SLURMTMPDIR

echo “working directory = “$SLURM_SUBMIT_DIR
cd  /nas/scratch/william