[MITgcm-devel] model crashes when opening scratch units

Martin Losch Martin.Losch at awi.de
Tue Jul 18 08:14:07 EDT 2017


Hi all,

I am following up on an old post. Is there good reason for having
       OPEN(UNIT=scrUnit1,STATUS='SCRATCH’)
(in open_copy_data_file.F and eeset_parms.F) without specifying a file name? In other words, wouldn’t it make sense to replace

# if defined (TARGET_BGL) || defined (TARGET_CRAYXT)
      WRITE(scratchFile1,'(A,I4.4)') 'scratch1.', myProcId
      WRITE(scratchFile2,'(A,I4.4)') 'scratch2.', myProcId
      OPEN(UNIT=scrUnit1, FILE=scratchFile1, STATUS='UNKNOWN')
      OPEN(UNIT=scrUnit2, FILE=scratchFile2, STATUS='UNKNOWN')
# else
      OPEN(UNIT=scrUnit1,STATUS='SCRATCH')
      OPEN(UNIT=scrUnit2,STATUS='SCRATCH')
# endif

by
      WRITE(scratchFile1,'(A,I4.4)') 'scratch1.', myProcId
      WRITE(scratchFile2,'(A,I4.4)') 'scratch2.', myProcId
      OPEN(UNIT=scrUnit1, FILE=scratchFile1, STATUS=’SCRATCH')
      OPEN(UNIT=scrUnit2, FILE=scratchFile2, STATUS=’SCRATCH’)
and get rid off the TARGET_CRAYXT altogether (I think it’s the only place where it is used in the code)? One would probably have to change the format for the integer to ‘(A,I6.6)' or so. 
Specifying the status “SCRATCH” obvious produces non-unique file names that cause the model to crash in some cases (I just had another one at our computer leaving the beginnger PhD student very puzzled).

Martin




> On 17. Apr 2015, at 10:06, Martin Losch <Martin.Losch at awi.de> wrote:
> 
> Hi Matt,
> 
> thanks, that’s the flag I was looking for, it works for me too (except for all the “scratch*” files).
> 
> M.
> 
>> On 16 Apr 2015, at 18:56, Matthew Mazloff <mmazloff at ucsd.edu> wrote:
>> 
>> Hi Martin
>> 
>> Yes, I sometimes get this when running with lots of cores, so I don't think Cray_XC30 is the only platform where this is an issue.
>> 
>> However defining 
>> #define TARGET_CRAYXT
>> in CPP_EEOPTIONS.h
>> always fixes it
>> 
>> Matt
>> 
>> On Apr 16, 2015, at 8:16 AM, Martin Losch <Martin.Losch at awi.de> wrote:
>> 
>>> Hi there,
>>> 
>>> this is probably not directly related to the MITgcm, but when I try to run the model on ECMWF’s Cray_XC30, I sometimes (depends a little on the number processors) get this type of error message:
>>> 
>>>> lib-4051 : UNRECOVERABLE library error 
>>>> The file must not exist prior to OPEN if STATUS is 'NEW'.
>>>> 
>>>> Encountered during an OPEN of unit 11
>>>> Fortran unit 11 is not connected
>>>> 
>>>> lib-4051 : UNRECOVERABLE library error 
>>>> The file must not exist prior to OPEN if STATUS is 'NEW'.
>>>> 
>>>> Encountered during an OPEN of unit 11
>>>> Fortran unit 11 is not connected
>>>> Application 56285485 is crashing. ATP analysis proceeding...
>>>> 
>>>> ATP Stack walkback for Rank 579 starting:
>>>> _start at start.S:113
>>>> __libc_start_main at libc-start.c:242
>>>> main at main.f:4353
>>>> eeboot_ at eeboot.f:1583
>>>> eeset_parms_ at eeset_parms.f:1821
>>>> _OPEN at 0xa3a47d
>>>> __OPN at 0xa3a22c
>>>> _f_open at 0xa380a4
>>>> _ferr at 0xa33d6a
>>>> abort at abort.c:92
>>>> raise at pt-raise.c:42
>>>> ATP Stack walkback for Rank 579 done
>>>>>>> 
>>> The line in eeset_parms.f is the one where the first scratch unit is opened.
>>>    OPEN(UNIT=scrUnit1,STATUS='SCRATCH')
>>>    OPEN(UNIT=scrUnit2,STATUS='SCRATCH’)
>>> 
>>> Has anyone had this problem? Is this a hardware bug?
>>> 
>>> Martin
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>> 
>> 
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
> 




More information about the MITgcm-devel mailing list