[MITgcm-support] Error on Cheyenne HPC with diagnostics package

Uchida Takaya tu2140 at columbia.edu
Thu Nov 15 18:40:45 EST 2018


Dear MITgcm support,

I’m sorry for the consecutive emails. I would like to add the tail of STDOUT and job log (the STDERRs are empty) to my previous email regarding runs on Cheyenne.

STDOUT:
(PID.TID 0806.0001)                   /*  other model components, through a coupler */
(PID.TID 0806.0001) debugMode =    T ; /* print debug msg. (sequence of S/R calls)  */
(PID.TID 0806.0001) printMapIncludesZeros=    F ; /* print zeros in Std.Output maps */
(PID.TID 0806.0001) maxLengthPrt1D=   65 /* maxLength of 1D array printed to StdOut */
(PID.TID 0806.0001) 

Job log:
forrtl: No such file or directory
forrtl: No such file or directory
forrtl: severe (28): CLOSE error, unit 11, file "Unknown"
Image              PC                Routine            Line        Source             
libifcoremt.so.5   00002AAAABB45A2A  for__io_return        Unknown  Unknown
libifcoremt.so.5   00002AAAABB3719B  for_close             Unknown  Unknown
mitgcmuv           0000000000558567  eeset_parms_             2141  eeset_parms.f
mitgcmuv           00000000005565AE  eeboot_                  1825  eeboot.f
mitgcmuv           00000000005AA6F3  MAIN__                   4516  main.f
mitgcmuv           0000000000402ABE  Unknown               Unknown  Unknown
libc-2.19.so       00002AAAADA1DB25  __libc_start_main     Unknown  Unknown
mitgcmuv           00000000004029C9  Unknown               Unknown  Unknown
forrtl: severe (28): CLOSE error, unit 11, file "Unknown"
Image              PC                Routine            Line        Source             
libifcoremt.so.5   00002AAAABB45A2A  for__io_return        Unknown  Unknown
libifcoremt.so.5   00002AAAABB3719B  for_close             Unknown  Unknown
mitgcmuv           0000000000558567  eeset_parms_             2141  eeset_parms.f
mitgcmuv           00000000005565AE  eeboot_                  1825  eeboot.f
mitgcmuv           00000000005AA6F3  MAIN__                   4516  main.f
mitgcmuv           0000000000402ABE  Unknown               Unknown  Unknown
libc-2.19.so       00002AAAADA1DB25  __libc_start_main     Unknown  Unknown
mitgcmuv           00000000004029C9  Unknown               Unknown  Unknown
forrtl: No such file or directory
forrtl: severe (28): CLOSE error, unit 11, file "Unknown"
Image              PC                Routine            Line        Source             
libifcoremt.so.5   00002AAAABB45A2A  for__io_return        Unknown  Unknown
libifcoremt.so.5   00002AAAABB3719B  for_close             Unknown  Unknown
mitgcmuv           0000000000558567  eeset_parms_             2141  eeset_parms.f
mitgcmuv           00000000005565AE  eeboot_                  1825  eeboot.f
mitgcmuv           00000000005AA6F3  MAIN__                   4516  main.f
mitgcmuv           0000000000402ABE  Unknown               Unknown  Unknown
libc-2.19.so       00002AAAADA1DB25  __libc_start_main     Unknown  Unknown
mitgcmuv           00000000004029C9  Unknown               Unknown  Unknown
MPT ERROR: MPI_COMM_WORLD rank 1589 has terminated without calling MPI_Finalize()
        aborting job

Thank you,
Takaya
———————
PhD Candidate
Physical Oceanography
Columbia University in the City of New York
https://roxyboy.github.io/

> On Nov 15, 2018, at 5:48 PM, Uchida Takaya <tu2140 at columbia.edu> wrote:
> 
> Dear MITgcm support,
> 
> I have a run on Cheyenne, which runs fine without the diagnostics package turned on but fails within seconds once I turn the package on and it gives me no useful errors except for:
> 
> MPT ERROR: MPI_COMM_WORLD rank 1589 has terminated without calling MPI_Finalize()
> 	aborting job
> 
> I use the same namelist files, which run fine on the Columbia University Habanero cluster so my expectation was that it should all work fine. I am a bit lost here and would like to know if others have had related issues running MITgcm on Cheyenne.
> 
> The optfile I use, which can also be found here ( https://github.com/roxyboy/ChannelMOC_Cheyenne/tree/master/SO_only-physics/channel_flat <https://github.com/roxyboy/ChannelMOC_Cheyenne/tree/master/SO_only-physics/channel_flat> ) including the namelist files, is:
> 
> module load intel/17.0.1 mpt/2.15f netcdf/4.6.1
> 
> FC=mpif90
> CC=mpicc
> F90C=mpif90
> 
> DEFINES='-DALLOW_USE_MPI -DALWAYS_USE_MPI -DWORDLENGTH=4'
> CPP='/lib/cpp  -traditional -P'
> EXTENDED_SRC_FLAG='-132'
> OMPFLAG='-openmp'
> CFLAGS='-fPIC'
> LDADD='-shared-intel'
> 
> LIBS="-L${MPI_ROOT}/lib"
> INCLUDES="-I${MPI_ROOT}/include"
> NOOPTFLAGS='-O0 -fPIC'
> 
> #FFLAGS="-fPIC -convert big_endian -assume byterecl -align -xCORE-AVX2" # 4% slower with -O2
> FFLAGS="-fPIC -convert big_endian -assume byterecl -align"
> FDEBUG='-W0 -WB'
> FFLAGS="$FDEBUG $FFLAGS"
> 
> FOPTIM='-O3'
> FOPTIM="$FOPTIM -ip -fp-model precise -traceback -ftz"
> 
> Any advice would be appreciated.
> 
> Thank you,
> Takaya
> ———————
> PhD Candidate
> Physical Oceanography
> Columbia University in the City of New York
> https://roxyboy.github.io/ <https://roxyboy.github.io/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20181115/29e4f3f7/attachment-0001.html>


More information about the MITgcm-support mailing list