[MITgcm-support] Error on Cheyenne HPC with diagnostics package

Martin Losch Martin.Losch at awi.de
Mon Nov 19 10:06:49 EST 2018


Hi Takaya,

most likely the output you get is not up to date with the model, probably because your system keeps a large buffer that is not flushed to disk when the model crashes. If you can figure out a way to make the system dump everything the model writes into the buffer before terminating, you’d probably see where the model stops. 
If things work with useDiagnostics = .FALSE., but not with .TRUE., then you’ll have to search in the diagnostics package (or maybe data.diagnostics, a mispelled variable name, a missing comma at the end of a line …, things like that)

Martin

> On 19. Nov 2018, at 15:48, Uchida Takaya <tu2140 at columbia.edu> wrote:
> 
> Hi Martin,
> 
> 
> Thank you for getting back to me.
> I do have the eedata file in the directory and what I have in the file is:
> 
> What is weird to me is that the run works fine when the diagnostics package is turned off, but fails when it is turned on with no errors related to the package...
> 
> 
> Best,
> Takaya
> ———————
> PhD Candidate
> Physical Oceanography
> Columbia University in the City of New York
> https://roxyboy.github.io/
> 
>> On Nov 19, 2018, at 5:28 AM, Martin Losch <Martin.Losch at awi.de> wrote:
>> 
>> Hi Takaya,
>> 
>> the job-log implies says that the executable is looking for a file and cannot find it. And the traceback even gives a good clue, which file it is:
>> eeset_parms.F reads the file “eedata”. Are you sure that it is in your run directory?
>> 
>> Martin
>> 
>>> On 15. Nov 2018, at 23:48, Uchida Takaya <tu2140 at columbia.edu> wrote:
>>> 
>>> Dear MITgcm support,
>>> 
>>> I have a run on Cheyenne, which runs fine without the diagnostics package turned on but fails within seconds once I turn the package on and it gives me no useful errors except for:
>>> 
>>> MPT ERROR: MPI_COMM_WORLD rank 1589 has terminated without calling MPI_Finalize()
>>> 	aborting job
>>> 
>>> I use the same namelist files, which run fine on the Columbia University Habanero cluster so my expectation was that it should all work fine. I am a bit lost here and would like to know if others have had related issues running MITgcm on Cheyenne.
>>> 
>>> The optfile I use, which can also be found here ( https://github.com/roxyboy/ChannelMOC_Cheyenne/tree/master/SO_only-physics/channel_flat ) including the namelist files, is:
>>> 
>>> module load intel/17.0.1 mpt/2.15f netcdf/4.6.1
>>> 
>>> FC=mpif90
>>> CC=mpicc
>>> F90C=mpif90
>>> 
>>> DEFINES='-DALLOW_USE_MPI -DALWAYS_USE_MPI -DWORDLENGTH=4'
>>> CPP='/lib/cpp  -traditional -P'
>>> EXTENDED_SRC_FLAG='-132'
>>> OMPFLAG='-openmp'
>>> CFLAGS='-fPIC'
>>> LDADD='-shared-intel'
>>> 
>>> LIBS="-L${MPI_ROOT}/lib"
>>> INCLUDES="-I${MPI_ROOT}/include"
>>> NOOPTFLAGS='-O0 -fPIC'
>>> 
>>> #FFLAGS="-fPIC -convert big_endian -assume byterecl -align -xCORE-AVX2" # 4% slower with -O2
>>> FFLAGS="-fPIC -convert big_endian -assume byterecl -align"
>>> FDEBUG='-W0 -WB'
>>> FFLAGS="$FDEBUG $FFLAGS"
>>> 
>>> FOPTIM='-O3'
>>> FOPTIM="$FOPTIM -ip -fp-model precise -traceback -ftz"
>>> 
>>> Any advice would be appreciated.
>>> 
>>> Thank you,
>>> Takaya
>>> ———————
>>> PhD Candidate
>>> Physical Oceanography
>>> Columbia University in the City of New York
>>> https://roxyboy.github.io/
>>> 
>>> _______________________________________________
>>> MITgcm-support mailing list
>>> MITgcm-support at mitgcm.org
>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>> 
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support



More information about the MITgcm-support mailing list