[MITgcm-support] mpi problem after cvs update

Jean-Michel Campin jmc at ocean.mit.edu
Sat Jan 23 17:40:34 EST 2010


Hi Michael,

Do you have modified source code (usually in dir listed after
the "-mods" argument of genmake2 ) ? It could be that one of those
fortran file needs to be updated because of main code changes.
A list (or a tar file) of your modified source code dir could
be useful. 

>> And could you try to run one of
>> the simple verification experiment with MPI.
This would also still be helpful. You could just pick a simple one
(e.g., exp2:
cd verification/exp2/build
cp ../code/SIZE.h_mpi .
and then the same way you build and run your code:
 genmake2 , make depend , make  ... )

Jean-Michel

On Sat, Jan 23, 2010 at 01:01:14PM -0600, m. r. schaferkotter wrote:
> thanks JM.
>
> a) it was a clean make. i updated last night.
>
> here more info.
>
> previous version:
>
> (PID.TID 0000.0001) // MITgcmUV version:  checkpoint61v
> (PID.TID 0000.0001) // Build user:        schaferk
> (PID.TID 0000.0001) // Build host:        sapphire01
> (PID.TID 0000.0001) // Build date:        Thu Jan 21 18:01:07 CST 2010
>
> environment:
>
> schaferk:sapphire01% uname -a
> Linux sapphire01 2.6.16.54-0.2.12_1.0101.4789.0-ss #1 SMP Thu Nov 12  
> 18:02:52 CST 2009 x86_64 x86_64 x86_64 GNU/Linux
>
> build script:
> linux_amd64_pgf90_sapphire
>
> FC='ftn'
> CC='cc'
> CPP='cpp -P -traditional'
>
> DEFINES='-DWORDLENGTH=4 -DNML_TERMINATOR -DALLOW_USE_MPI - 
> DALWAYS_USE_MPI -DTARGET_CRAYXT'
>
> INCLUDES="-I/opt/mpt/3.2.0/xt/mpich2-pgi64/include"
>
> FFLAGS='-byteswapio -r8 -Mnodclchk -Mextend -fPIC'
> FOPTIM='-O3 -fastsse -tp k8-64 -pc=64 -Msmart -Mipa=fast'
>
> CFLAGS='-O3 -fastsse -fPIC'
>
>
>
> packages:
>
> data.pkg
>
> # Packages
>  &PACKAGES
>  useOBCS=.TRUE.,
>  useDiagnostics=.TRUE.,
>  useMNC=.FALSE.,
>  &
>
> packages.conf
>
> debug
> generic_advdiff
> kpp
> mdsio
> mom_fluxform
> mom_vecinv
> monitor
> obcs
> rw
> timeave
> cal
> exf
> diagnostics
>
>
> comments:
>
> nothing strange in the genmake_warnings other than remarks about netcdf 
> includes, which i/m not using and did _not_ use in the previous build 
> which runs.
>
> this is the last part of the of STDOUT.0000 file:
> (PID.TID 0000.0001) // Model current state
> (PID.TID 0000.0001) //  
> =======================================================
> (PID.TID 0000.0001)
> (PID.TID 0000.0001) //  
> =======================================================
> (PID.TID 0000.0001) // Begin MONITOR dynamic field statistics
> (PID.TID 0000.0001) //  
> =======================================================
> (PID.TID 0000.0001) %MON time_tsnumber                =                   
>   0
> (PID.TID 0000.0001) %MON time_secondsf                =    
> 0.0000000000000E+00
>
>
> On Jan 23, 2010, at 11:49 AM, Jean-Michel Campin wrote:
>
>> Hi Michael,
>>
>> One thing that could be useful to know is what was the version of
>> the code you updated from (or when did you do the previous update).
>> Otherwise, the code is tested everyday, with and without MPI,
>> and all the tests from last night went normally.
>> Can be caused by:
>> a) something in the build process. Did you try a full "make Clean"
>> before making the new executable ? And could you try to run one of
>> the simple verification experiment with MPI.
>> b) either in some pieces of code that we don't test (would need to  
>> know
>> more about the type of set-up/packages/options you are using).
>>
>> Thanks,
>> Jean-Michel
>>
>> On Sat, Jan 23, 2010 at 11:06:58AM -0600, m. r. schaferkotter wrote:
>>> all;
>>> i did cvs update yesterday (jan 22), and now i/m getting these error
>>> messages after building and attempting to run my (moments earlier
>>> successful) job.
>>>
>>> schaferk:sapphire01% more r.001.err
>>> aborting job:
>>> Fatal error in MPI_Allreduce: Invalid MPI_Op, error stack:
>>> MPI_Allreduce(714).......: MPI_Allreduce(sbuf=0x7fffffffb28c,
>>> rbuf=0x7fffffffb2ec, count=1, dtype=0x4c00081b, MPI_SUM, M
>>> PI_COMM_WORLD) failed
>>> MPIR_SUM_check_dtype(388): MPI_Op MPI_SUM operation not defined for  
>>> this
>>> datatype
>>> aborting job:
>>> Fatal error in MPI_Allreduce: Invalid MPI_Op, error stack:
>>> MPI_Allreduce(714).......: MPI_Allreduce(sbuf=0x7fffffffb28c,
>>> rbuf=0x7fffffffb2ec, count=1, dtype=0x4c00081b, MPI_SUM, M
>>> PI_COMM_WORLD) failed
>>>
>>>
>>> fortunately, i moved aside the old executable and the job runs with
>>> that.
>>>
>>>
>>> what/s up with this?
>>>
>>> michael schaferkotter
>>>
>>>
>>> _______________________________________________
>>> MITgcm-support mailing list
>>> MITgcm-support at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support



More information about the MITgcm-support mailing list