Submitting and Running Compute Jobs
Computations are performed by the compute nodes via batch jobs submitted to the queue system.
Submitting Batch Jobs
In order to submit a job to the batch system you use the
sbatch command, which takes a batch script as input. For example, a batch script
myprogram.srun might look like this:
#!/bin/bash #SBATCH --qos main srun myprogram
This script can then be submitted to the scheduler:
sbatch < myprogram.srun
myprogram on a single processor on one of the compute nodes on the cluster in the same working directory as the
sbatch command is run from. The
--qos flags specify which partition and QOS (roughly equivalent to a queue on RC2) the job is submitted to. The available queues are currently described in the FAQ, and each provides different priority and constraints to those allowed to submit to it. You can view all queues with sinfo and you can see all QOSes with sacctmgr list qos. It is recommended that users generally use the main queue so that the queue software can optimally balance the usage of the cluster resources. See the SLURM documentation for many more options.
The login node or “head” node should not be used for computation, only for compiling and organizational things. If you need to do some serious analysis or large compilation, it is better that you use an interactive session on a compute node. This will be better for both you and others, since you will not be competing for the resources, both cpu and memory, of the single head node. This can be done by submitting an interactive job:
srun --pty -p partition_name /bin/bash
This will submit a job to the batch system in the main queue and stay attached to the job in interactive shell mode with the bash shell. This command will not complete immediately like a normal batch submission, but will wait until the job starts and connect your terminal directly to it.
To run an interactive GUI job:
- ssh -X uahpc.ua.edu (or use putty and xming, etc)
Note: some users have trouble with "-X" failing Xwin security. If this happens you can substitute "-Y"
- Start an interactive job as shown above.
- You’ll be dropped onto a compute node with x11 forwarding enabled. From there you can launch your program. For example, you can type “use matlab” followed by “matlab” to launch Matlab interactively.
Among the other options it is possible to submit jobs that will execute across multiple compute nodes. This is discussed more in Using MPI on UAHPC
Two commands to check the status of the batch queues are
squeue -u username and
squeue will give an overview of how much work is submitted and running under each queue. It produces output like the following:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2313 debug bash tbanders R 0:09 1 compute-0-3
To delete a job (say, 2313 above), type
sprio will show job priority on pending jobs.
Also the cluster status Ganglia page shows current load on the cluster, and Slurmweb displays the status of queues and jobs like
slurmtop for a text-based display of cluster and job status.
Job Submission Scripts
These are run as:
Here are some examples:
#SBATCH -c xx # xx tasks per core
#!/bin/bash #SBATCH -n 4 # use 4 cores #SBATCH --qos main # use the main qos #SBATCH -N 1 # only use 1 node — in this case, 4 cores on 1 node #SBATCH -J fluentP1 # name the job fluentP1 #SBATCH --mem=20G # reserve 20G of memory cd /home/tbanderson/jobdirectory srun hostname -s > hosts.$SLURM_JOB_ID export FLUENT_GUI=off fluent 3d -t4 -pib -mpi=openmpi -cnf=hosts.$SLURM_JOB_ID -slurm -ssh -g -i journal_file.jou > fluentP1.out 2> fluentP1.err rm hosts.$SLURM_JOB_ID
To reserve memory PER JOB, use the following syntax:
sbatch --mem=nnnn (where nnnn is the amount of RAM you want to reserve in megabytes)
OR, in a job submission file:
#SBATCH --mem=nnnn (where nnnn is the amount of RAM you want to reserve in megabytes)
To reserve memory PER CORE, use the following syntax:
sbatch --mem-per-cpu=nnnn (where nnnn is the amount of RAM you want to reserve in megabytes)
OR, in a job submission file:
#SBATCH --mem-per-cpu=nnnn (where nnnn is the amount of RAM you want to reserve in megabytes)
To run another script after your job finishes, use –epilog