[MITgcm-support] Troubleshooting OpenMPI Issues with mpiexec for Jasper (Westgrid Cluster)
Benjamin Ocampo
rurik at ualberta.ca
Tue Jan 5 10:23:00 EST 2016
Hi All:
I am having a problem using OpenMPI for the Jasper Cluster on Westgrid
and it involves the command mpiexec. Note that I used mpiexec instead
of mpirun because I am unable to resolve another issue with it involving
mpirun not being able to find the shared library ''libmpi.so.1''.
The process for compiling and running the code is as follows:
1 $ROOT_DIR/tools/genmake2 -mods $ROTATING_TANK/code -mpi -of
~/MITgcm/tools/build_options/jasper_mpi2.opt
2 make depend
3 make
4 mpdboot #Ensures that mpiexec communicates with processors
5 mpiexec -n 4 ./mitgcmuv
6 mpdallexit
The ''jasper_mpi2.opt'' is written as follows (based off another set of
code seen in
<
http://mitgcm.org/download/daily_snapshot/MITgcm/tools/build_options/linux_amd64_ifort+mpi_ice_nas
>):
1 FC=/global/software/openmpi/openmpi-1.6.5-intel/bin/mpif90
2 CC=/global/software/openmpi/openmpi-1.6.5-intel/bin/mpicc
3
4 DEFINES='-DALLOW_USE_MPI -DALWAYS_USE_MPI -DWORDLENGTH=4'
5 CPP='/lib/cpp -traditional -P'
6 EXTENDED_SRC_FLAG='-132'
7 OMPFLAG='-openmp'
8 CFLAGS='-fPIC'
9 LDADD='-shared-intel'
10
11 LIBS='-L/global/software/openmpi/openmpi-1.6.5-intel/lib -lmpi
-L/global/software/netcdf/netcdf-4.1.3/lib -lnetcdf'
12 INCLUDES='-I/global/software/openmpi/openmpi-1.6.5-intel/include
-I/global/software/netcdf/netcdf-4.1.3/include'
13
14 NOOPTFLAGS='-O0'
with SIZE.h as:
1 & sNx = 30,
2 & sNy = 23,
3 & OLx = 1,
4 & OLy = 1,
5 & nSx = 1,
6 & nSy = 1,
7 & nPx = 4,
8 & nPy = 1,
9 & Nx = sNx*nSx*nPx,
10 & Ny = sNy*nSy*nPy,
11 & Nr = 29)
and ''eedata'' as:
1 &EEPARMS
2 nTx=1,
3 nTy=1,
4 usingMPI=.TRUE.,
5 &
However, when I run the code, I get the following error message:
1 (PID.TID 0000.0001) *** ERROR *** EEBOOT_MINIMAL: No. of procs= 1
not equal to nPx*nPy= 4
2 (PID.TID 0000.0001) *** ERROR *** EEDIE: earlier error in
multi-proc/thread setting
3 PID.TID 0000.0001) *** ERROR *** PROGRAM MAIN: ends with fatal Error
This error message is a bit strange to me because I set the number
of processors to 4 in the batch script job submission. Is there a way
to resolve this issue?
Cheers,
Benjamin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20160105/700f48b0/attachment.htm>
More information about the MITgcm-support
mailing list