[MITgcm-support] mpi problem after cvs update
m. r. schaferkotter
schaferk at bellsouth.net
Sat Jan 23 21:53:50 EST 2010
JM:
thanks for the suggestions.
a colleague suggested that the problem could be either the '-
Mipa=fast' FOPTIM option or
one of the includes. a web search on the error message indicated some
possible a mis-compilation issues
with the compilation of the mpi code.
turns out it was the include. the old include is
#INCLUDES="-I/opt/mpt/3.2.0/xt/mpich2-pgi64/include"
while 'module list' yields (among others) the default module
xt-mpt/3.4.0
i changed the include to the 3.4.0 version.
i did notice during the compile using the old include some warnings
about type conversion in some of the
print routines.
so i/m not sure exactly what/s going on.
anyway things are working again.
thanks again.
michael
On Jan 23, 2010, at 4:40 PM, Jean-Michel Campin wrote:
> Hi Michael,
>
> Do you have modified source code (usually in dir listed after
> the "-mods" argument of genmake2 ) ? It could be that one of those
> fortran file needs to be updated because of main code changes.
> A list (or a tar file) of your modified source code dir could
> be useful.
>
>>> And could you try to run one of
>>> the simple verification experiment with MPI.
> This would also still be helpful. You could just pick a simple one
> (e.g., exp2:
> cd verification/exp2/build
> cp ../code/SIZE.h_mpi .
> and then the same way you build and run your code:
> genmake2 , make depend , make ... )
>
> Jean-Michel
>
> On Sat, Jan 23, 2010 at 01:01:14PM -0600, m. r. schaferkotter wrote:
>> thanks JM.
>>
>> a) it was a clean make. i updated last night.
>>
>> here more info.
>>
>> previous version:
>>
>> (PID.TID 0000.0001) // MITgcmUV version: checkpoint61v
>> (PID.TID 0000.0001) // Build user: schaferk
>> (PID.TID 0000.0001) // Build host: sapphire01
>> (PID.TID 0000.0001) // Build date: Thu Jan 21 18:01:07 CST
>> 2010
>>
>> environment:
>>
>> schaferk:sapphire01% uname -a
>> Linux sapphire01 2.6.16.54-0.2.12_1.0101.4789.0-ss #1 SMP Thu Nov 12
>> 18:02:52 CST 2009 x86_64 x86_64 x86_64 GNU/Linux
>>
>> build script:
>> linux_amd64_pgf90_sapphire
>>
>> FC='ftn'
>> CC='cc'
>> CPP='cpp -P -traditional'
>>
>> DEFINES='-DWORDLENGTH=4 -DNML_TERMINATOR -DALLOW_USE_MPI -
>> DALWAYS_USE_MPI -DTARGET_CRAYXT'
>>
>> INCLUDES="-I/opt/mpt/3.2.0/xt/mpich2-pgi64/include"
>>
>> FFLAGS='-byteswapio -r8 -Mnodclchk -Mextend -fPIC'
>> FOPTIM='-O3 -fastsse -tp k8-64 -pc=64 -Msmart -Mipa=fast'
>>
>> CFLAGS='-O3 -fastsse -fPIC'
>>
>>
>>
>> packages:
>>
>> data.pkg
>>
>> # Packages
>> &PACKAGES
>> useOBCS=.TRUE.,
>> useDiagnostics=.TRUE.,
>> useMNC=.FALSE.,
>> &
>>
>> packages.conf
>>
>> debug
>> generic_advdiff
>> kpp
>> mdsio
>> mom_fluxform
>> mom_vecinv
>> monitor
>> obcs
>> rw
>> timeave
>> cal
>> exf
>> diagnostics
>>
>>
>> comments:
>>
>> nothing strange in the genmake_warnings other than remarks about
>> netcdf
>> includes, which i/m not using and did _not_ use in the previous build
>> which runs.
>>
>> this is the last part of the of STDOUT.0000 file:
>> (PID.TID 0000.0001) // Model current state
>> (PID.TID 0000.0001) //
>> =======================================================
>> (PID.TID 0000.0001)
>> (PID.TID 0000.0001) //
>> =======================================================
>> (PID.TID 0000.0001) // Begin MONITOR dynamic field statistics
>> (PID.TID 0000.0001) //
>> =======================================================
>> (PID.TID 0000.0001) %MON time_tsnumber =
>> 0
>> (PID.TID 0000.0001) %MON time_secondsf =
>> 0.0000000000000E+00
>>
>>
>> On Jan 23, 2010, at 11:49 AM, Jean-Michel Campin wrote:
>>
>>> Hi Michael,
>>>
>>> One thing that could be useful to know is what was the version of
>>> the code you updated from (or when did you do the previous update).
>>> Otherwise, the code is tested everyday, with and without MPI,
>>> and all the tests from last night went normally.
>>> Can be caused by:
>>> a) something in the build process. Did you try a full "make Clean"
>>> before making the new executable ? And could you try to run one of
>>> the simple verification experiment with MPI.
>>> b) either in some pieces of code that we don't test (would need to
>>> know
>>> more about the type of set-up/packages/options you are using).
>>>
>>> Thanks,
>>> Jean-Michel
>>>
>>> On Sat, Jan 23, 2010 at 11:06:58AM -0600, m. r. schaferkotter wrote:
>>>> all;
>>>> i did cvs update yesterday (jan 22), and now i/m getting these
>>>> error
>>>> messages after building and attempting to run my (moments earlier
>>>> successful) job.
>>>>
>>>> schaferk:sapphire01% more r.001.err
>>>> aborting job:
>>>> Fatal error in MPI_Allreduce: Invalid MPI_Op, error stack:
>>>> MPI_Allreduce(714).......: MPI_Allreduce(sbuf=0x7fffffffb28c,
>>>> rbuf=0x7fffffffb2ec, count=1, dtype=0x4c00081b, MPI_SUM, M
>>>> PI_COMM_WORLD) failed
>>>> MPIR_SUM_check_dtype(388): MPI_Op MPI_SUM operation not defined for
>>>> this
>>>> datatype
>>>> aborting job:
>>>> Fatal error in MPI_Allreduce: Invalid MPI_Op, error stack:
>>>> MPI_Allreduce(714).......: MPI_Allreduce(sbuf=0x7fffffffb28c,
>>>> rbuf=0x7fffffffb2ec, count=1, dtype=0x4c00081b, MPI_SUM, M
>>>> PI_COMM_WORLD) failed
>>>>
>>>>
>>>> fortunately, i moved aside the old executable and the job runs with
>>>> that.
>>>>
>>>>
>>>> what/s up with this?
>>>>
>>>> michael schaferkotter
>>>>
>>>>
>>>> _______________________________________________
>>>> MITgcm-support mailing list
>>>> MITgcm-support at mitgcm.org
>>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>>
>>> _______________________________________________
>>> MITgcm-support mailing list
>>> MITgcm-support at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
More information about the MITgcm-support
mailing list