<
High Performance Computing

Sun Grid Engine specifics on chadwick

Introduction

On the chadwick cluster, applications are run on the compute nodes by submitting tasks in the form of jobs to the Sun Grid Engine scheduler. If you are new to this then you might want to consult the basic information provided on the getting started guide.

The chadwick set up enforces very strict limits on the resources which jobs can use (e.g. memory, number of cores taken up by jobs) and for most applications you will need to specify resource requirements in your job submission file(s). For applications that use MPI or SMP parallelisation you will also need to specify which parallel environment is needed using the -pe option.

In SGE job scripts, resource requirements are specified using the -l option followed by one or more resource specifications of the form resource=requirement. For example this job script would request that a job can run for 10 hours on 8 cores:

#!/bin/bash

#$ -cwd -V 
#$ -l h_rt=10:00:00
#$ -l h_vmem=8G

Rscript testit.R
            
This example is functionally equivalent with the resource requirements placed in a comma-separated list on one line:
#!/bin/bash

#$ -cwd -V 
#$ -l h_rt=10:00:00,h_vmem=8G

Rscript testit.R
            
Alternatively you can specify SGE options directly on the command line e.g.:
$ qsub -l h_rt=10:00:00,h_vmem=8G test_script.sh
            
Note that the run time value is given in the form hh:mm:ss (where hh stands for hours, mm minutes and ss seconds).


Job Run Times

By the default, jobs submitted to the cluster have a maximum run time of 8 hours and once this time has elapsed, the job will be terminated immediately. Many applications will require longer than this so it is very important to specify a maximum run time in your job submission script via the h_rt resource. This would indicate a maximum run time of 5 days (120 hours):

-l h_rt=120:00:00
            
Five days is the absolute limit for chadwick jobs although it may be possible to extend this in exceptional circumstances (email arc-support@liverpool.ac.uk for advice).


CPU Types

You can specify which type of CPU you want jobs to run on by using the cputype resource. For example this:

-l cputype=sandybridge
            
would ensure jobs only run on the Sandybridge nodes. For the Westmeres, use -l cputype=westmere


Memory Usage

On chadwick, the SGE scheduler rigorously enforces memory usage limits to prevent jobs owned by one user taking excess memory to the detriment of other users' jobs. The default limit is 4GB (per core) so if you want to run serial jobs that require a large amount of memory you will need to request this specifically using the h_vmem resource. For example this would request 64 GB of memory:

-l h_vmem=64G
            
and this would do the same:
-l h_vmem=64000M
            
Note that there is no B in the units i.e. use G not GB for gigabytes and M not MB for megabytes.

The memory for parallel jobs (SMP and MPI) is given per core and is not the total amount of memory taken by all threads/processes. Note also that the Sandybridge nodes have 64 GB and Westmeres 48 GB. If you require more than this (possibly upto 2 TB) then the job will need to run on the Large Memory node (comp00). Usually it may take a long time for enough resources to become free for large memory jobs to run so contact the ARC team for advice first (email: arc-support@liverpool.ac.uk).


Parallel Environments (SMP and MPI)

The way in which the number of cores required to run a particular job is specified will depend on whether it uses SMP or MPI parallelism. For SMP jobs, the maximum number of cores is given using the -pe smp option for example this job script:

#!/bin/bash

#$ -cwd -V 
#$ -pe smp 8

SMPapplication options
            
would request eight processor cores (on the same node) to run the job. The Sandybridge nodes have 16 cores each and the Westmeres 12 cores each. If your job requires more than this it will need to run on the Large Memory node (comp00). Jobs may wait a long time for resources to become free for large core jobs so contact the ARC team for advice first (email: arc-support@liverpool.ac.uk).

For applications that use MPI parallelism, the -pe mpi option is needed. For example this job script:

#!/bin/bash

#$ -cwd -V
#$ -pe mpi 16 

MPIapplication options
            
would run an MPI application on 16 cores. The SGE scheduler will attempt to "fill up" a single node before running processes across multiple nodes. Ideally then, the above job will run on an a single Sandybridge nodes however this cannot be guaranteed and the MPI processors may get scattered across multiple nodes to the possible detriment of performance if the cluster is busy. Clearly for large core counts (>24) it will be essentail to distribute the MPI processes across multiple nodes.


Targeting the Large Memory and GPU nodes

You can ensure that the jobs run on the Large Memory node by using the -l h option with the hostname set to comp1 viz:

#!/bin/bash

#$ -cwd -V
#$ -l h=comp1

LargeMemoryApplication options
            
If you specify a large memory requirement or core count, jobs may end up on the Large Memory node out of necessity in any case as other nodes do provide sufficient resources.

To run jobs only on the GPU visualisation nodes (visu1 and visu2), specify the node requirement as below:

-l h="visu*" 
            

Licensed Applications

In order to run a licensed application, you will need to request a certain number of license tokens (or "seats") for the application to run. The exact number will depend on how many processes/threads the job is to use and varies from one applications to another (word of warning: it is rarely one token per core). The resource request is of the form -l application=tokens so for example this:

-l abaqus=5
            
would request five tokens to run an Abaqus job. If you need to run licensed application it is best to stick to using the ARC "run scripts" or contact the ARC team for advice (email arc-support@liverpool.ac.uk).


Useful SGE commands

qsub scriptfile submit a SGE job using the job script called scriptfile
qsub start:stop scriptfile submit an array job with indicies starting at start and finishing at stop

qstat list your own jobs in the SGE batch queue
qstat -u '*' list all users jobs in the SGE batch queue
qstat -j job-ID give detailed information on the job with ID job-ID

qdel -j job-ID delete/remove job with ID job-ID
qdel -u username delete/remove all jobs belonging to username

qstat -s z list recently completed/terminated jobs
qacct -j job-ID list detailed information on a completed/terminated Job
qacct -j job-ID | grep failed find out why a job failed (usually ran out of time or memory)
qacct -j job-ID | grep ru_wallclock find out how long a job ran for (wallclock time)
qacct -j job-ID | grep maxvmem find the maximum amount of memory used by a job

qhost list all chadwick compute nodes and their current status

qlic give summary of license use (e.g. Fluent, Abaqus etc)