[MITgcm-support] Troubleshooting OpenMPI Issues with mpiexec for Jasper (Westgrid Cluster)

Benjamin Ocampo rurik at ualberta.ca
Tue Jan 5 10:23:00 EST 2016


Hi All:

I am having a problem using OpenMPI for the Jasper Cluster on Westgrid
and it involves the command mpiexec. Note that I used mpiexec instead
of mpirun because I am unable to resolve another issue with it involving
mpirun not being able to find the shared library ''libmpi.so.1''.

The process for compiling and running the code is as follows:

1   $ROOT_DIR/tools/genmake2 -mods $ROTATING_TANK/code -mpi -of
~/MITgcm/tools/build_options/jasper_mpi2.opt
2   make depend
3   make
4   mpdboot  #Ensures that mpiexec communicates with processors
5   mpiexec -n 4 ./mitgcmuv
6   mpdallexit

The ''jasper_mpi2.opt'' is written as follows (based off another set of
code seen in
<
http://mitgcm.org/download/daily_snapshot/MITgcm/tools/build_options/linux_amd64_ifort+mpi_ice_nas
>):

1    FC=/global/software/openmpi/openmpi-1.6.5-intel/bin/mpif90
2    CC=/global/software/openmpi/openmpi-1.6.5-intel/bin/mpicc
3
4    DEFINES='-DALLOW_USE_MPI -DALWAYS_USE_MPI -DWORDLENGTH=4'
5    CPP='/lib/cpp  -traditional -P'
6    EXTENDED_SRC_FLAG='-132'
7    OMPFLAG='-openmp'
8    CFLAGS='-fPIC'
9    LDADD='-shared-intel'
10
11   LIBS='-L/global/software/openmpi/openmpi-1.6.5-intel/lib -lmpi
-L/global/software/netcdf/netcdf-4.1.3/lib -lnetcdf'
12   INCLUDES='-I/global/software/openmpi/openmpi-1.6.5-intel/include
-I/global/software/netcdf/netcdf-4.1.3/include'
13
14   NOOPTFLAGS='-O0'

with SIZE.h as:

 1     &           sNx =  30,
 2     &           sNy =  23,
 3     &           OLx =   1,
 4     &           OLy =   1,
 5     &           nSx =   1,
 6     &           nSy =   1,
 7     &           nPx =   4,
 8     &           nPy =   1,
 9     &           Nx  = sNx*nSx*nPx,
 10    &           Ny  = sNy*nSy*nPy,
 11    &           Nr  =  29)

and ''eedata'' as:

 1   &EEPARMS
 2   nTx=1,
 3   nTy=1,
 4   usingMPI=.TRUE.,
 5   &

However, when I run the code, I get the following error message:

1   (PID.TID 0000.0001) *** ERROR *** EEBOOT_MINIMAL: No. of procs=     1
not equal to nPx*nPy=     4
2   (PID.TID 0000.0001) *** ERROR *** EEDIE: earlier error in
multi-proc/thread setting
3   PID.TID 0000.0001) *** ERROR *** PROGRAM MAIN: ends with fatal Error

This error message is a bit strange to me because I set the number
of processors to 4 in the batch script job submission. Is there a way
to resolve this issue?

Cheers,
Benjamin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20160105/700f48b0/attachment.htm>


More information about the MITgcm-support mailing list