Submitting and Running Compute Jobs

Computations are performed by the compute nodes via batch jobs submitted to the queue system.

SLURM Documentation

Submitting Batch Jobs

Single Jobs

In order to submit a job to the batch system you use the sbatch command, which takes a batch script as input. For example, a batch script myprogram.srun might look like this:

#SBATCH --qos main

srun myprogram

This script can then be submitted to the scheduler:

sbatch < myprogram.srun

This runs myprogram on a single processor on one of the compute nodes on the cluster in the same working directory as the sbatch command is run from. The -p and --qos flags specify which partition and QOS (roughly equivalent to a queue on RC2) the job is submitted to. The available queues are currently described in the FAQ, and each provides different priority and constraints to those allowed to submit to it. You can view all queues with sinfo and you can see all QOSes with sacctmgr list qos.  It is recommended that users generally use the main queue so that the queue software can optimally balance the usage of the cluster resources. See the SLURM documentation for many more options.

Interactive Jobs

The login node or “head” node should not be used for computation, only for compiling and organizational things. If you need to do some serious analysis or large compilation, it is better that you use an interactive session on a compute node. This will be better for both you and others, since you will not be competing for the resources, both cpu and memory, of the single head node. This can be done by submitting an interactive job:

srun --pty -p partition_name /bin/bash

This will submit a job to the batch system in the main queue and stay attached to the job in interactive shell mode with the bash shell. This command will not complete immediately like a normal batch submission, but will wait until the job starts and connect your terminal directly to it.

To run an interactive GUI job:

  1. ssh -X (or use putty and xming, etc)
    Note: some users have trouble with "-X" failing Xwin security.
    If this happens you can substitute "-Y"
  2. Start an interactive job as shown above.
  3. You’ll be dropped onto a compute node with x11 forwarding enabled.  From there you can launch your program. For example, you can type “use matlab” followed by “matlab” to launch Matlab interactively.

Multi-processor Jobs

Among the other options it is possible to submit jobs that will execute across multiple compute nodes. This is discussed more in Using MPI on UAHPC

Checking Jobs

Two commands to check the status of the batch queues are squeue -u username and sprio. squeue will give an overview of how much work is submitted and running under each queue. It produces output like the following:

  2313     debug bash tbanders R     0:09     1 compute-0-3

To delete a job (say, 2313 above), type scancel 2313.

sprio will show job priority on pending jobs.

Also the cluster status Ganglia page shows current load on the cluster, and Slurmweb displays the status of queues and jobs like squeue. Run slurmtop for a text-based display of cluster and job status.

Job Submission Scripts

These are run as:

sbatch script_file

Here are some examples:

#SBATCH -c xx                                       # xx tasks per core

Fluent Example

#SBATCH -n 4                                        # use 4 cores
#SBATCH --qos main                                  # use the main qos
#SBATCH -N 1                                        # only use 1 node — in this case, 4 cores on 1 node
#SBATCH -J fluentP1                                 # name the job fluentP1
#SBATCH --mem=20G                                   # reserve 20G of memory

cd /home/tbanderson/jobdirectory
srun hostname -s > hosts.$SLURM_JOB_ID
export FLUENT_GUI=off
fluent 3d -t4 -pib -mpi=openmpi -cnf=hosts.$SLURM_JOB_ID -slurm -ssh -g -i journal_file.jou > fluentP1.out 2> fluentP1.err
rm hosts.$SLURM_JOB_ID

Reserving Memory

To reserve memory PER JOB, use the following syntax:

sbatch --mem=nnnn (where nnnn is the amount of RAM you want to reserve in megabytes)

OR, in a job submission file:

#SBATCH --mem=nnnn  (where nnnn is the amount of RAM you want to reserve in megabytes)

To reserve memory PER CORE, use the following syntax:

sbatch --mem-per-cpu=nnnn (where nnnn is the amount of RAM you want to reserve in megabytes)

OR, in a job submission file:

#SBATCH --mem-per-cpu=nnnn  (where nnnn is the amount of RAM you want to reserve in megabytes)

To run another script after your job finishes, use –epilog

#SBATCH --epilog=""