[MITgcm-support] Troubleshooting OpenMPI Issues with mpiexec for Jasper (Westgrid Cluster)

Benjamin Ocampo rurik at ualberta.ca
Wed Jan 6 18:29:25 EST 2016


Hi Jean-Michel:

I ran a "hello_world.f" script and it does not appear to be working the
output is as follows for n=12 processors:

 Hello world from            0
 Hello world from            0
 Hello world from            0
 Hello world from            0
 Hello world from            0
 Hello world from            0
 Hello world from            0
 Hello world from            0
 Hello world from            0
 Hello world from            0
 Hello world from            0
 Hello world from            0

with helloworld written as:


      INCLUDE 'mpif.h'
      INTEGER error, rank

      CALL MPI_INIT(error)
      CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank, error)
      PRINT *, "Hello world from ",rank
      CALL MPI_FINALIZE(error)

      STOP
      END

I will reinvestigate using mpirun as well as figure out why I am not
calling all 12 processors.

Cheers,
Benjamin


On Wed, Jan 6, 2016 at 12:30 PM, Jean-Michel Campin <jmc at ocean.mit.edu>
wrote:

> Hi Benjamin,
>
> looks like the mpiexec command does not recognize that you
> want to run on 4 procs:
> > 1   (PID.TID 0000.0001) *** ERROR *** EEBOOT_MINIMAL: No. of procs=     1
>
> Did you try to compile and run a simple "hello_world" type program
> to check that our installed MPI is working as expected ?
>
> Cheers,
> Jean-Michel
>
> On Tue, Jan 05, 2016 at 08:23:00AM -0700, Benjamin Ocampo wrote:
> > Hi All:
> >
> > I am having a problem using OpenMPI for the Jasper Cluster on Westgrid
> > and it involves the command mpiexec. Note that I used mpiexec instead
> > of mpirun because I am unable to resolve another issue with it involving
> > mpirun not being able to find the shared library ''libmpi.so.1''.
> >
> > The process for compiling and running the code is as follows:
> >
> > 1   $ROOT_DIR/tools/genmake2 -mods $ROTATING_TANK/code -mpi -of
> > ~/MITgcm/tools/build_options/jasper_mpi2.opt
> > 2   make depend
> > 3   make
> > 4   mpdboot  #Ensures that mpiexec communicates with processors
> > 5   mpiexec -n 4 ./mitgcmuv
> > 6   mpdallexit
> >
> > The ''jasper_mpi2.opt'' is written as follows (based off another set of
> > code seen in
> > <
> >
> http://mitgcm.org/download/daily_snapshot/MITgcm/tools/build_options/linux_amd64_ifort+mpi_ice_nas
> > >):
> >
> > 1    FC=/global/software/openmpi/openmpi-1.6.5-intel/bin/mpif90
> > 2    CC=/global/software/openmpi/openmpi-1.6.5-intel/bin/mpicc
> > 3
> > 4    DEFINES='-DALLOW_USE_MPI -DALWAYS_USE_MPI -DWORDLENGTH=4'
> > 5    CPP='/lib/cpp  -traditional -P'
> > 6    EXTENDED_SRC_FLAG='-132'
> > 7    OMPFLAG='-openmp'
> > 8    CFLAGS='-fPIC'
> > 9    LDADD='-shared-intel'
> > 10
> > 11   LIBS='-L/global/software/openmpi/openmpi-1.6.5-intel/lib -lmpi
> > -L/global/software/netcdf/netcdf-4.1.3/lib -lnetcdf'
> > 12   INCLUDES='-I/global/software/openmpi/openmpi-1.6.5-intel/include
> > -I/global/software/netcdf/netcdf-4.1.3/include'
> > 13
> > 14   NOOPTFLAGS='-O0'
> >
> > with SIZE.h as:
> >
> >  1     &           sNx =  30,
> >  2     &           sNy =  23,
> >  3     &           OLx =   1,
> >  4     &           OLy =   1,
> >  5     &           nSx =   1,
> >  6     &           nSy =   1,
> >  7     &           nPx =   4,
> >  8     &           nPy =   1,
> >  9     &           Nx  = sNx*nSx*nPx,
> >  10    &           Ny  = sNy*nSy*nPy,
> >  11    &           Nr  =  29)
> >
> > and ''eedata'' as:
> >
> >  1   &EEPARMS
> >  2   nTx=1,
> >  3   nTy=1,
> >  4   usingMPI=.TRUE.,
> >  5   &
> >
> > However, when I run the code, I get the following error message:
> >
> > 1   (PID.TID 0000.0001) *** ERROR *** EEBOOT_MINIMAL: No. of procs=     1
> > not equal to nPx*nPy=     4
> > 2   (PID.TID 0000.0001) *** ERROR *** EEDIE: earlier error in
> > multi-proc/thread setting
> > 3   PID.TID 0000.0001) *** ERROR *** PROGRAM MAIN: ends with fatal Error
> >
> > This error message is a bit strange to me because I set the number
> > of processors to 4 in the batch script job submission. Is there a way
> > to resolve this issue?
> >
> > Cheers,
> > Benjamin
>
> > _______________________________________________
> > MITgcm-support mailing list
> > MITgcm-support at mitgcm.org
> > http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20160106/4a760e99/attachment-0001.htm>


More information about the MITgcm-support mailing list