[MITgcm-support] MPI on SGI

Tue Jun 14 18:14:42 EDT 2011

Hi David et al.,

I did as you recommended below, but continue to receive the same error
message.

Any other ideas or suggestions are certainly welcome!

Thanks for all your help,
Nikki

-----------------------------------------
Hi Nikki,
Maybe a first step would be to replace the line :

$CMD test_16cpu_nl/mitgcmuv_test_
16cpu_nl

by

/usr/local/bin/mpiexec -n 16 test_16cpu_nl/mitgcmuv_test_16cpu_nl

just to make sure the problem is not is the shell part of the script.

david

>> On Jun 13, 2011, at 11:24, Nikki Lovenduski <uclanik at gmail.com> wrote:
>>
>>> Hi Matt, Ryan, et al.,
>>>
>>> Thanks for your rapid responses.
>>>
>>> I'm compiling an older version of the model (checkpoint 58).  I
>>> tried adding usingMPI =.TRUE., to eedata, but got the same result.
>>>
>>> Here's my optfile:
>>>
>>> ****************************
>>> #!/bin/bash
>>>
>>> FC=`which mpif77`
>>> CC=`which mpicc`
>>> DEFINES='-DALLOW_USE_MPI -DALWAYS_USE_MPI -D_BYTESWAPIO -DWORDLENGTH=4'
>>> CPP='cpp  -traditional -P'
>>> MPI='true'
>>> INCLUDEDIRS="$MPIROOT/include $NETCDF/include"
>>> LIBDIRS=$MPIROOT/lib
>>> LIBS="-L/opt/torque/lib -L$MPIROOT/lib -L$NETCDF/lib -lnetcdf -lmpich"
>>> INCLUDES="-I$NETCDF/include"
>>> NOOPTFLAGS='-O0'
>>> S64='$(TOOLSDIR)/
set64bitConst.sh'
>>> #  For IEEE, use the "-ffloat-store" option
>>> if test "x$IEEE" = x ; then
>>>    FFLAGS="-Wunused -Wuninitialized -DWORDLENGTH=4 -I$NETCDF/include"
>>>    FOPTIM='-O3 -funroll-loops'
>>> else
>>>    FFLAGS="-Wunused -ffloat-store -DWORDLENGTH=4 -I$NETCDF/include"
>>>    FOPTIM='-O0 '
>>> fi
>>> ****************
>>>
>>> And here's my job submission script:
>>>
>>> ****************
>>> #!/bin/sh
>>> #PBS -l nodes=2:ppn=8
>>> #PBS -V
>>> #PBS -m ae
>>>
>>> NCPU=`wc -l < $PBS_NODEFILE`
>>> NNODES=`uniq $PBS_NODEFILE | wc -l`
>>>
>>> MPIRUN=/usr/local/bin/mpiexec
>>>
>>> CMD="$MPIRUN -n $NCPU"
>>>
>>> echo "--> Node file: " $PBS_NODEFILE
>>> echo "--> Running on nodes: " `uniq $PBS_NODEFILE`
>>> echo "--> Number of cpus: " $NCPU
>>> echo "--> Number of nodes: " $NNODES
>>> echo "--> Launch command: " $CMD
>>>
>>> cd test_16cpu_nl
>>> $CMD test_16cpu_nl/mitgcmuv_test_16cpu_nl
>>> *****************
>>>
>>> Any additional comments or suggestions would be most helpful!
>>>
>>> Thanks,
>>> Nikki
>>>
>>>
>>>
>>> ------------------------------------------------------
>>> Hi Nikki,
>>>
>>> not sure, but perhaps try adding
>>> usingMPI =.TRUE.,
>>> to eedata
>>>
>>>
>>> Though this error doesn't seem to be part of the new code -- what
>>> version are you using?
>>>
>>> -Matt
>>>
>>> -----------------------------------------------------
>>> Hi Nikki,
>>>
>>> It will be easier to figure out what's going on if you tell us which
>>> optfile you are using at build time (specified in your genmake2
>>> command) and also exactly how you are calling the executable at run
>>> time (your job sumbmission script).
>>>
>>> -R
>>>
>>>
>>> On Jun 10, 2011, at 2:53 PM, Nikki Lovenduski wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm trying to run the sector model version of MITgcm on an SGI Altix
>>>> HPCC.  The model runs fine on 1 processor.  However, I am having
>>>> trouble running on 16 processors:  The compilation works fine (I add
>>>> -mpi to my genmake2 command and specify nPx = 2 and nPy = 8 in
>>>> SIZE.h), but I get an error at execution in the routine INI_PROCS.
>>>>
>>>> *** ERROR *** S/R INI_PROCS: No. of processes not equal to
>>>> nPx*nPy    1   16
>>>>
>>>> This error message indicates that MPI only assigns 1 processor to my
>>>> run; whereas, it should be running on nPx*nPy=16 processors.
>>>>
>>>> My job submission script specifies the correct number of processors
>>>> (-n 16).
>>>>
>>>> Any ideas what I'm doing wrong?
>>>>
>>>> Thanks,
>>>> Nikki
>>>>
>>>> ----------------------
>>>> Nikki Lovenduski
>>>> ATOC/INSTAAR
>>>> Univ. Colorado at Boulder
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20110614/368614c0/attachment-0001.htm>