[MITgcm-support] Troubleshooting OpenMPI Issues with mpiexec for Jasper (Westgrid Cluster)

Jean-Michel Campin jmc at ocean.mit.edu
Wed Jan 6 14:30:08 EST 2016


Hi Benjamin,

looks like the mpiexec command does not recognize that you
want to run on 4 procs:
> 1   (PID.TID 0000.0001) *** ERROR *** EEBOOT_MINIMAL: No. of procs=     1

Did you try to compile and run a simple "hello_world" type program
to check that our installed MPI is working as expected ? 

Cheers,
Jean-Michel

On Tue, Jan 05, 2016 at 08:23:00AM -0700, Benjamin Ocampo wrote:
> Hi All:
> 
> I am having a problem using OpenMPI for the Jasper Cluster on Westgrid
> and it involves the command mpiexec. Note that I used mpiexec instead
> of mpirun because I am unable to resolve another issue with it involving
> mpirun not being able to find the shared library ''libmpi.so.1''.
> 
> The process for compiling and running the code is as follows:
> 
> 1   $ROOT_DIR/tools/genmake2 -mods $ROTATING_TANK/code -mpi -of
> ~/MITgcm/tools/build_options/jasper_mpi2.opt
> 2   make depend
> 3   make
> 4   mpdboot  #Ensures that mpiexec communicates with processors
> 5   mpiexec -n 4 ./mitgcmuv
> 6   mpdallexit
> 
> The ''jasper_mpi2.opt'' is written as follows (based off another set of
> code seen in
> <
> http://mitgcm.org/download/daily_snapshot/MITgcm/tools/build_options/linux_amd64_ifort+mpi_ice_nas
> >):
> 
> 1    FC=/global/software/openmpi/openmpi-1.6.5-intel/bin/mpif90
> 2    CC=/global/software/openmpi/openmpi-1.6.5-intel/bin/mpicc
> 3
> 4    DEFINES='-DALLOW_USE_MPI -DALWAYS_USE_MPI -DWORDLENGTH=4'
> 5    CPP='/lib/cpp  -traditional -P'
> 6    EXTENDED_SRC_FLAG='-132'
> 7    OMPFLAG='-openmp'
> 8    CFLAGS='-fPIC'
> 9    LDADD='-shared-intel'
> 10
> 11   LIBS='-L/global/software/openmpi/openmpi-1.6.5-intel/lib -lmpi
> -L/global/software/netcdf/netcdf-4.1.3/lib -lnetcdf'
> 12   INCLUDES='-I/global/software/openmpi/openmpi-1.6.5-intel/include
> -I/global/software/netcdf/netcdf-4.1.3/include'
> 13
> 14   NOOPTFLAGS='-O0'
> 
> with SIZE.h as:
> 
>  1     &           sNx =  30,
>  2     &           sNy =  23,
>  3     &           OLx =   1,
>  4     &           OLy =   1,
>  5     &           nSx =   1,
>  6     &           nSy =   1,
>  7     &           nPx =   4,
>  8     &           nPy =   1,
>  9     &           Nx  = sNx*nSx*nPx,
>  10    &           Ny  = sNy*nSy*nPy,
>  11    &           Nr  =  29)
> 
> and ''eedata'' as:
> 
>  1   &EEPARMS
>  2   nTx=1,
>  3   nTy=1,
>  4   usingMPI=.TRUE.,
>  5   &
> 
> However, when I run the code, I get the following error message:
> 
> 1   (PID.TID 0000.0001) *** ERROR *** EEBOOT_MINIMAL: No. of procs=     1
> not equal to nPx*nPy=     4
> 2   (PID.TID 0000.0001) *** ERROR *** EEDIE: earlier error in
> multi-proc/thread setting
> 3   PID.TID 0000.0001) *** ERROR *** PROGRAM MAIN: ends with fatal Error
> 
> This error message is a bit strange to me because I set the number
> of processors to 4 in the batch script job submission. Is there a way
> to resolve this issue?
> 
> Cheers,
> Benjamin

> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support




More information about the MITgcm-support mailing list