[MITgcm-support] MPI on SGI

Ryan Abernathey rpa at MIT.EDU
Mon Jun 13 14:52:17 EDT 2011


There appears to be a line break between $CMD and your mitgcm executable (last two lines of your jobscript). This would mean that you are not actually using mpi to call the gcm. 

-R

Sent from my iPhone

On Jun 13, 2011, at 11:24, Nikki Lovenduski <uclanik at gmail.com> wrote:

> Hi Matt, Ryan, et al.,
> 
> Thanks for your rapid responses.  
> 
> I'm compiling an older version of the model (checkpoint 58).  I tried adding usingMPI =.TRUE., to eedata, but got the same result.
> 
> Here's my optfile:
> 
> ****************************
> #!/bin/bash
> 
> FC=`which mpif77`
> CC=`which mpicc`
> DEFINES='-DALLOW_USE_MPI -DALWAYS_USE_MPI -D_BYTESWAPIO -DWORDLENGTH=4'
> CPP='cpp  -traditional -P'
> MPI='true'
> INCLUDEDIRS="$MPIROOT/include $NETCDF/include"
> LIBDIRS=$MPIROOT/lib
> LIBS="-L/opt/torque/lib -L$MPIROOT/lib -L$NETCDF/lib -lnetcdf -lmpich"
> INCLUDES="-I$NETCDF/include"
> NOOPTFLAGS='-O0'
> S64='$(TOOLSDIR)/set64bitConst.sh'
> #  For IEEE, use the "-ffloat-store" option
> if test "x$IEEE" = x ; then
>     FFLAGS="-Wunused -Wuninitialized -DWORDLENGTH=4 -I$NETCDF/include"
>     FOPTIM='-O3 -funroll-loops'
> else
>     FFLAGS="-Wunused -ffloat-store -DWORDLENGTH=4 -I$NETCDF/include"
>     FOPTIM='-O0 '
> fi
> ****************
> 
> And here's my job submission script:
> 
> ****************
> #!/bin/sh
> #PBS -l nodes=2:ppn=8
> #PBS -V
> #PBS -m ae
> 
> NCPU=`wc -l < $PBS_NODEFILE`
> NNODES=`uniq $PBS_NODEFILE | wc -l`
> 
> MPIRUN=/usr/local/bin/mpiexec
> 
> CMD="$MPIRUN -n $NCPU"
> 
> echo "--> Node file: " $PBS_NODEFILE
> echo "--> Running on nodes: " `uniq $PBS_NODEFILE`
> echo "--> Number of cpus: " $NCPU
> echo "--> Number of nodes: " $NNODES
> echo "--> Launch command: " $CMD
> 
> cd test_16cpu_nl
> $CMD test_16cpu_nl/mitgcmuv_test_16cpu_nl
> *****************
> 
> Any additional comments or suggestions would be most helpful!
> 
> Thanks,
> Nikki
> 
> 
> 
> ------------------------------------------------------
> Hi Nikki,
> 
> not sure, but perhaps try adding
>  usingMPI =.TRUE.,
> to eedata
> 
> 
> Though this error doesn't seem to be part of the new code -- what
> version are you using?
> 
> -Matt
> 
> -----------------------------------------------------
> Hi Nikki,
> 
> It will be easier to figure out what's going on if you tell us which
> optfile you are using at build time (specified in your genmake2
> command) and also exactly how you are calling the executable at run
> time (your job sumbmission script).
> 
> -R
> 
> 
> On Jun 10, 2011, at 2:53 PM, Nikki Lovenduski wrote:
> 
> > Hi all,
> >
> > I'm trying to run the sector model version of MITgcm on an SGI Altix
> > HPCC.  The model runs fine on 1 processor.  However, I am having
> > trouble running on 16 processors:  The compilation works fine (I add
> > -mpi to my genmake2 command and specify nPx = 2 and nPy = 8 in
> > SIZE.h), but I get an error at execution in the routine INI_PROCS.
> >
> > *** ERROR *** S/R INI_PROCS: No. of processes not equal to
> > nPx*nPy    1   16
> >
> > This error message indicates that MPI only assigns 1 processor to my
> > run; whereas, it should be running on nPx*nPy=16 processors.
> >
> > My job submission script specifies the correct number of processors
> > (-n 16).
> >
> > Any ideas what I'm doing wrong?
> >
> > Thanks,
> > Nikki
> >
> > ----------------------
> > Nikki Lovenduski
> > ATOC/INSTAAR
> > Univ. Colorado at Boulder
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support



More information about the MITgcm-support mailing list