[MITgcm-support] Troubleshooting OpenMPI Issues with mpiexec for Jasper (Westgrid Cluster)
Jean-Michel Campin
jmc at ocean.mit.edu
Wed Jan 6 14:30:08 EST 2016
Hi Benjamin,
looks like the mpiexec command does not recognize that you
want to run on 4 procs:
> 1 (PID.TID 0000.0001) *** ERROR *** EEBOOT_MINIMAL: No. of procs= 1
Did you try to compile and run a simple "hello_world" type program
to check that our installed MPI is working as expected ?
Cheers,
Jean-Michel
On Tue, Jan 05, 2016 at 08:23:00AM -0700, Benjamin Ocampo wrote:
> Hi All:
>
> I am having a problem using OpenMPI for the Jasper Cluster on Westgrid
> and it involves the command mpiexec. Note that I used mpiexec instead
> of mpirun because I am unable to resolve another issue with it involving
> mpirun not being able to find the shared library ''libmpi.so.1''.
>
> The process for compiling and running the code is as follows:
>
> 1 $ROOT_DIR/tools/genmake2 -mods $ROTATING_TANK/code -mpi -of
> ~/MITgcm/tools/build_options/jasper_mpi2.opt
> 2 make depend
> 3 make
> 4 mpdboot #Ensures that mpiexec communicates with processors
> 5 mpiexec -n 4 ./mitgcmuv
> 6 mpdallexit
>
> The ''jasper_mpi2.opt'' is written as follows (based off another set of
> code seen in
> <
> http://mitgcm.org/download/daily_snapshot/MITgcm/tools/build_options/linux_amd64_ifort+mpi_ice_nas
> >):
>
> 1 FC=/global/software/openmpi/openmpi-1.6.5-intel/bin/mpif90
> 2 CC=/global/software/openmpi/openmpi-1.6.5-intel/bin/mpicc
> 3
> 4 DEFINES='-DALLOW_USE_MPI -DALWAYS_USE_MPI -DWORDLENGTH=4'
> 5 CPP='/lib/cpp -traditional -P'
> 6 EXTENDED_SRC_FLAG='-132'
> 7 OMPFLAG='-openmp'
> 8 CFLAGS='-fPIC'
> 9 LDADD='-shared-intel'
> 10
> 11 LIBS='-L/global/software/openmpi/openmpi-1.6.5-intel/lib -lmpi
> -L/global/software/netcdf/netcdf-4.1.3/lib -lnetcdf'
> 12 INCLUDES='-I/global/software/openmpi/openmpi-1.6.5-intel/include
> -I/global/software/netcdf/netcdf-4.1.3/include'
> 13
> 14 NOOPTFLAGS='-O0'
>
> with SIZE.h as:
>
> 1 & sNx = 30,
> 2 & sNy = 23,
> 3 & OLx = 1,
> 4 & OLy = 1,
> 5 & nSx = 1,
> 6 & nSy = 1,
> 7 & nPx = 4,
> 8 & nPy = 1,
> 9 & Nx = sNx*nSx*nPx,
> 10 & Ny = sNy*nSy*nPy,
> 11 & Nr = 29)
>
> and ''eedata'' as:
>
> 1 &EEPARMS
> 2 nTx=1,
> 3 nTy=1,
> 4 usingMPI=.TRUE.,
> 5 &
>
> However, when I run the code, I get the following error message:
>
> 1 (PID.TID 0000.0001) *** ERROR *** EEBOOT_MINIMAL: No. of procs= 1
> not equal to nPx*nPy= 4
> 2 (PID.TID 0000.0001) *** ERROR *** EEDIE: earlier error in
> multi-proc/thread setting
> 3 PID.TID 0000.0001) *** ERROR *** PROGRAM MAIN: ends with fatal Error
>
> This error message is a bit strange to me because I set the number
> of processors to 4 in the batch script job submission. Is there a way
> to resolve this issue?
>
> Cheers,
> Benjamin
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
More information about the MITgcm-support
mailing list