Choosing a version

The following choices are available through dotkit:

  • use IntelCompilers – Intel MPI Library 4.0 U1, installed in /share/apps/intel/impi/4.0.1.007/intel64/
  • use mvapichGCCShared – MVAPICH2 1.9a2, preferring shared libraries, built with gcc and installed in /usr/mpi/gcc/mvapich2-1.9a2 – not verified to work with SLURM
  • use mvapichGCCStatic – MVAPICH2 1.9a2, preferring static libraries, built with gcc and installed in /usr/mpi/gcc/mvapich2-1.9a2 – not verified to work with SLURM
  • use openmpiSLURM – OpenMPI 1.10.2 built with gcc 4.8.4.
  • use openmpiIntel – OpenMPI 1.6.4, built with Intel compiler and installed in /share/apps/openmpi/Intel/1.6.4 – not verified to work with SLURM

It is important to ensure that your software is built against the same MPI version that you use to run it with srun. The dotkit setup above will place appropriate versions of mpiccmpirun, etc., in your PATH, and set LD_LIBRARY_PATH to find the correct MPI libraries.

Running under SLURM

Just as you don’t ssh directly into compute nodes and start processes, please don’t directly invoke mpirun on the cluster. Not all jobs running on UAHPC are MPI jobs, but all jobs should be scheduled with SLURM in order to ensure fair use of cluster resources. When you submit a job, SLURM may allocate some processors on different nodes, and you have to ensure that your MPI job actually uses the correct number of processors on the nodes allocated to your job by SLURM.

It is possible to write wrapper scripts that interpret the SLURM allocations through environment variables, and then invoke mpirun with the correct list of nodes where it should run your software. However, SLURM already knows how to do this without using mpirun. We recommend the following method of scheduling MPI jobs:

Here is an example using Intel MPI:

#!/bin/bash
#SBATCH --mem-per-cpu 4000
#SBATCH -n 64
#SBATCH -o  /some/dir/output.log
#SBATCH --qos main

srun your_commands_here

And here is an example using OpenMPI:

#!/bin/bash
#SBATCH --mem-per-cpu 4000
#SBATCH -n 64
#SBATCH -o  /some/dir/output.log
#SBATCH --qos main

srun --mpi=pmi2 your_commands_here