Submitting and Running Compute Jobs
Computations are performed by the compute nodes via batch jobs submitted to the queue system.
Submitting Batch Jobs
Single Jobs
In order to submit a job to the batch system you use the sbatch
command, which takes a batch script as input. For example, a batch script myprogram.srun
might look like this:
#!/bin/bash #SBATCH --qos main srun myprogram
This script can then be submitted to the scheduler:
sbatch < myprogram.srun
This runs myprogram
on a single processor on one of the compute nodes on the cluster in the same working directory as the sbatch
command is run from. The -p
and --qos
flags specify which partition and QOS (roughly equivalent to a queue on RC2) the job is submitted to. The available queues are currently described in the FAQ, and each provides different priority and constraints to those allowed to submit to it. You can view all queues with sinfo and you can see all QOSes with sacctmgr list qos. It is recommended that users generally use the main queue so that the queue software can optimally balance the usage of the cluster resources. See the SLURM documentation for many more options.
Interactive Jobs
The login node or “head” node should not be used for computation, only for compiling and organizational things. If you need to do some serious analysis or large compilation, it is better that you use an interactive session on a compute node. This will be better for both you and others, since you will not be competing for the resources, both cpu and memory, of the single head node. This can be done by submitting an interactive job:
srun --pty -p partition_name /bin/bash
This will submit a job to the batch system in the main queue and stay attached to the job in interactive shell mode with the bash shell. This command will not complete immediately like a normal batch submission, but will wait until the job starts and connect your terminal directly to it.
To run an interactive GUI job:
- ssh -X uahpc.ua.edu (or use putty and xming, etc)
Note: some users have trouble with "-X" failing Xwin security. If this happens you can substitute "-Y"
- Start an interactive job as shown above.
- You’ll be dropped onto a compute node with x11 forwarding enabled. From there you can launch your program. For example, you can type “use matlab” followed by “matlab” to launch Matlab interactively.
Multi-processor Jobs
Among the other options it is possible to submit jobs that will execute across multiple compute nodes. This is discussed more in Using MPI on UAHPC
Checking Jobs
Two commands to check the status of the batch queues are squeue -u username
and sprio
. squeue
will give an overview of how much work is submitted and running under each queue. It produces output like the following:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2313 debug bash tbanders R 0:09 1 compute-0-3
To delete a job (say, 2313 above), type scancel 2313
.
sprio
will show job priority on pending jobs.
Also the cluster status Ganglia page shows current load on the cluster, and Slurmweb displays the status of queues and jobs like squeue
. Run slurmtop
for a text-based display of cluster and job status.
Job Submission Scripts
These are run as:
sbatch script_file
Here are some examples:
#SBATCH -c xx # xx tasks per core
Fluent Example
#!/bin/bash #SBATCH -n 4 # use 4 cores #SBATCH --qos main # use the main qos #SBATCH -N 1 # only use 1 node — in this case, 4 cores on 1 node #SBATCH -J fluentP1 # name the job fluentP1 #SBATCH --mem=20G # reserve 20G of memory cd /home/tbanderson/jobdirectory srun hostname -s > hosts.$SLURM_JOB_ID export FLUENT_GUI=off fluent 3d -t4 -pib -mpi=openmpi -cnf=hosts.$SLURM_JOB_ID -slurm -ssh -g -i journal_file.jou > fluentP1.out 2> fluentP1.err rm hosts.$SLURM_JOB_ID
Reserving Memory
To reserve memory PER JOB, use the following syntax:
sbatch --mem=nnnn (where nnnn is the amount of RAM you want to reserve in megabytes)
OR, in a job submission file:
#SBATCH --mem=nnnn (where nnnn is the amount of RAM you want to reserve in megabytes)
To reserve memory PER CORE, use the following syntax:
sbatch --mem-per-cpu=nnnn (where nnnn is the amount of RAM you want to reserve in megabytes)
OR, in a job submission file:
#SBATCH --mem-per-cpu=nnnn (where nnnn is the amount of RAM you want to reserve in megabytes)
To run another script after your job finishes, use –epilog
#SBATCH --epilog="yourscripthere.sh"