[MITgcm-support] MPI on SGI

David Ferreira dfer at mit.edu
Mon Jun 13 17:38:29 EDT 2011


Hi Nikki,
Maybe a first step would be to replace the line :

$CMD test_16cpu_nl/mitgcmuv_test_16cpu_nl

by

/usr/local/bin/mpiexec -n 16 test_16cpu_nl/mitgcmuv_test_16cpu_nl

just to make sure the problem is not is the shell part of the script.

david





On 6/13/11 5:32 PM, Ryan Abernathey wrote:
> Scratch that last comment! It was just an artifact of reading your 
> email on my phone!
> -R
>
>
> On Jun 13, 2011, at 2:52 PM, Ryan Abernathey wrote:
>
>> There appears to be a line break between $CMD and your mitgcm 
>> executable (last two lines of your jobscript). This would mean that 
>> you are not actually using mpi to call the gcm.
>>
>> -R
>>
>> Sent from my iPhone
>>
>> On Jun 13, 2011, at 11:24, Nikki Lovenduski <uclanik at gmail.com> wrote:
>>
>>> Hi Matt, Ryan, et al.,
>>>
>>> Thanks for your rapid responses.
>>>
>>> I'm compiling an older version of the model (checkpoint 58).  I 
>>> tried adding usingMPI =.TRUE., to eedata, but got the same result.
>>>
>>> Here's my optfile:
>>>
>>> ****************************
>>> #!/bin/bash
>>>
>>> FC=`which mpif77`
>>> CC=`which mpicc`
>>> DEFINES='-DALLOW_USE_MPI -DALWAYS_USE_MPI -D_BYTESWAPIO -DWORDLENGTH=4'
>>> CPP='cpp  -traditional -P'
>>> MPI='true'
>>> INCLUDEDIRS="$MPIROOT/include $NETCDF/include"
>>> LIBDIRS=$MPIROOT/lib
>>> LIBS="-L/opt/torque/lib -L$MPIROOT/lib -L$NETCDF/lib -lnetcdf -lmpich"
>>> INCLUDES="-I$NETCDF/include"
>>> NOOPTFLAGS='-O0'
>>> S64='$(TOOLSDIR)/set64bitConst.sh'
>>> #  For IEEE, use the "-ffloat-store" option
>>> if test "x$IEEE" = x ; then
>>>    FFLAGS="-Wunused -Wuninitialized -DWORDLENGTH=4 -I$NETCDF/include"
>>>    FOPTIM='-O3 -funroll-loops'
>>> else
>>>    FFLAGS="-Wunused -ffloat-store -DWORDLENGTH=4 -I$NETCDF/include"
>>>    FOPTIM='-O0 '
>>> fi
>>> ****************
>>>
>>> And here's my job submission script:
>>>
>>> ****************
>>> #!/bin/sh
>>> #PBS -l nodes=2:ppn=8
>>> #PBS -V
>>> #PBS -m ae
>>>
>>> NCPU=`wc -l < $PBS_NODEFILE`
>>> NNODES=`uniq $PBS_NODEFILE | wc -l`
>>>
>>> MPIRUN=/usr/local/bin/mpiexec
>>>
>>> CMD="$MPIRUN -n $NCPU"
>>>
>>> echo "--> Node file: " $PBS_NODEFILE
>>> echo "--> Running on nodes: " `uniq $PBS_NODEFILE`
>>> echo "--> Number of cpus: " $NCPU
>>> echo "--> Number of nodes: " $NNODES
>>> echo "--> Launch command: " $CMD
>>>
>>> cd test_16cpu_nl
>>> $CMD test_16cpu_nl/mitgcmuv_test_16cpu_nl
>>> *****************
>>>
>>> Any additional comments or suggestions would be most helpful!
>>>
>>> Thanks,
>>> Nikki
>>>
>>>
>>>
>>> ------------------------------------------------------
>>> Hi Nikki,
>>>
>>> not sure, but perhaps try adding
>>> usingMPI =.TRUE.,
>>> to eedata
>>>
>>>
>>> Though this error doesn't seem to be part of the new code -- what
>>> version are you using?
>>>
>>> -Matt
>>>
>>> -----------------------------------------------------
>>> Hi Nikki,
>>>
>>> It will be easier to figure out what's going on if you tell us which
>>> optfile you are using at build time (specified in your genmake2
>>> command) and also exactly how you are calling the executable at run
>>> time (your job sumbmission script).
>>>
>>> -R
>>>
>>>
>>> On Jun 10, 2011, at 2:53 PM, Nikki Lovenduski wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm trying to run the sector model version of MITgcm on an SGI Altix
>>>> HPCC.  The model runs fine on 1 processor.  However, I am having
>>>> trouble running on 16 processors:  The compilation works fine (I add
>>>> -mpi to my genmake2 command and specify nPx = 2 and nPy = 8 in
>>>> SIZE.h), but I get an error at execution in the routine INI_PROCS.
>>>>
>>>> *** ERROR *** S/R INI_PROCS: No. of processes not equal to
>>>> nPx*nPy    1   16
>>>>
>>>> This error message indicates that MPI only assigns 1 processor to my
>>>> run; whereas, it should be running on nPx*nPy=16 processors.
>>>>
>>>> My job submission script specifies the correct number of processors
>>>> (-n 16).
>>>>
>>>> Any ideas what I'm doing wrong?
>>>>
>>>> Thanks,
>>>> Nikki
>>>>
>>>> ----------------------
>>>> Nikki Lovenduski
>>>> ATOC/INSTAAR
>>>> Univ. Colorado at Boulder
>>> _______________________________________________
>>> MITgcm-support mailing list
>>> MITgcm-support at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support




More information about the MITgcm-support mailing list