HPC terminology
HPC terminology regarding nodes, cores, processors and tasks taken from the LUNARC Aurora pages on the subject.
Term | Explanation | Fermi | |
---|---|---|---|
node | A physical computer | 40 | |
processor | a multi-core processor housing many processing elements | 2 per node | |
socket | plug where processor is placed, synonym for the processor | 2 per node | |
core | individual processing element | 6-16 per node | |
task | software process with own data & instructions forking multiple threads | specified in sbatch script | |
thread | An instruction stream sharing data with other threads from the task | specified in sbatch script |
Table 1. Terminology regarding resources as defined in the HPC community
HPC vs XDS terminology
Term | XDS terminology |
---|---|
tasks | JOBS (i.e. MAXIMUM_NUMBER_OF_JOBS) |
threads | PROCESSORS (i.e. MAXIMUM_NUMBER_OF_PROCESSORS) |
Table 2. HPC vs XDS terminology
Useful HPC commands
HPC command | Consequence |
---|---|
interactive -N 1 –exclusive -t 00:30:00 -A snic2018-3-251 | Get 30 min terminal window at compute node for project |
interactive –nodes=1 –exclusive -t 00:30:00 -A snic2018-3-25 | Get 30 min terminal window at compute node for project |
exit | leave terminal window, save compute time on project |
squeue -u x_user | Check my jobs in running or in the SLURM queue |
top | see all jobs running on current node |
top -U username | see my jobs running on current node |
scancel JOBID | Kill my job with JOBID |
module load XDSAPP | Load XDSAPP and dependencies CCP4, PHENIX, XDS, XDSTAT… |
module avail | What modules are there |
module purge | unload all modules |
Table 3. Basic HPC commands
Tetralith and Aurora differences
Tetralith vs Aurora | outcome |
---|---|
jobsh n1024 | Access a node when in use at NSC Tetralith |
ssh au118 | Access a node when in use at Lunarc Aurora |
Table 4. Command-line commands differing between Lunarc Aurora and NSC Tetralith
SBATCH and checking the queue
Running sbatch scripts is the most efficient way of using HPC compute time since once the job is finished, the clock counting compute time is stopped. Every sbatch script require a maximum time for the job to finish by #SBATCH -t 00:30:00
before the job can be scheduled into the queue. To check status of jobs sumitted by sbatch use squeue -u username
and obtain
Figure 1. Output of squeue -u username. The first column gives the job ID, the second the partition (or queue) where the job was submitted, the third the name of the job (specified by the user in the submission script) and the fourth the owner of the job. The fifth is the status of the job (R=running, PD=pending, CA=cancelled, CF=configuring, CG=completing, CD=completed, F=failed). The sixth column gives the elapsed time for each particular job. Finally, there are the number of nodes requested and the nodelist where the job is running (or the cause that it is not running).
Now it is possible to access the compute nodes and check job status in more detail using top
or top -U username
. This is done in two different ways at NSC Triolith and LUNARC Aurora:
- At LUNARC Aurora use:
ssh au118
- At NSC Tetralith use:
jobsh n1024
Once your terminal window is at the compute node you can check status by top
Figure 2. Result of top given at compute node. Using top the status of the job is indicated by (D=uninterruptible sleep, R=running, S=sleeping, T=traced or stopped, Z=zombie)
or by top -U username
Figure 3. Result of top -U username given at compute node
The interactive
command can use the same parameters as sbatch below.
SBATCH script line | Consequence |
---|---|
#!/bin/sh | Use sh to interpret the script |
#SBATCH -t 0:30:00 | Run the sh script for maximum 30 min |
#SBATCH –nodes=2 –exclusive | Allocate two full nodes for this sh script |
#SBATCH -A snic2017-1-XXX | Count compute time on project snic2017-1-XXX |
#SBATCH –mail-type=ALL | Send email when job start and stops |
#SBATCH –mail-user=name.surname@lu.se | Send email to name.surname@lu.se |
Table 5. SBATCH script lines. The interactive command is using the same terminology, however usually given in a single row at the login node as interactive --nodes=2 --exclusive -t 00:30:00 -A snic2017-1-XXX
Useful LINUX commands
- rsync -rvplt ./data username@aurora.lunarc.lu.se:/lunarc/nobackup/users/username/.
data directory copied to lunarc
- scp file.pdb username@aurora.lunarc.lu.se:/lunarc/nobackup/users/username/.
single file.pdb copied to lunarc