Choosing a version
The following choices are available through dotkit:
-
use IntelCompilers
– Intel MPI Library 4.0 U1, installed in /share/apps/intel/impi/4.0.1.007/intel64/ -
– not verified to work with SLURMuse mvapichGCCShared
– MVAPICH2 1.9a2, preferring shared libraries, built with gcc and installed in /usr/mpi/gcc/mvapich2-1.9a2 -
– not verified to work with SLURMuse mvapichGCCStatic
– MVAPICH2 1.9a2, preferring static libraries, built with gcc and installed in /usr/mpi/gcc/mvapich2-1.9a2 -
use openmpiSLURM
– OpenMPI 1.10.2 built with gcc 4.8.4. -
– not verified to work with SLURMuse openmpiIntel
– OpenMPI 1.6.4, built with Intel compiler and installed in /share/apps/openmpi/Intel/1.6.4
It is important to ensure that your software is built against the same MPI version that you use to run it with srun
. The dotkit setup above will place appropriate versions of mpicc
, mpirun
, etc., in your PATH, and set LD_LIBRARY_PATH to find the correct MPI libraries.
Running under SLURM
Just as you don’t ssh directly into compute nodes and start processes, please don’t directly invoke mpirun on the cluster. Not all jobs running on UAHPC are MPI jobs, but all jobs should be scheduled with SLURM in order to ensure fair use of cluster resources. When you submit a job, SLURM may allocate some processors on different nodes, and you have to ensure that your MPI job actually uses the correct number of processors on the nodes allocated to your job by SLURM.
It is possible to write wrapper scripts that interpret the SLURM allocations through environment variables, and then invoke mpirun
with the correct list of nodes where it should run your software. However, SLURM already knows how to do this without using mpirun
. We recommend the following method of scheduling MPI jobs:
Here is an example using Intel MPI:
#!/bin/bash #SBATCH --mem-per-cpu 4000 #SBATCH -n 64 #SBATCH -o /some/dir/output.log #SBATCH --qos main srun your_commands_here
And here is an example using OpenMPI:
#!/bin/bash #SBATCH --mem-per-cpu 4000 #SBATCH -n 64 #SBATCH -o /some/dir/output.log #SBATCH --qos main srun --mpi=pmi2 your_commands_here