[MITgcm-support] File reading error on XT5

David Hebert david.hebert.ctr at nrlssc.navy.mil
Thu Jan 22 17:10:07 EST 2009


Hi Matt,

Thanks for the help. It looks like I am to add 2 write statments and 2 
open statements? So in the end, are there supposed to be 4 open 
statements? Or am I supposed to replace

OPEN(UNIT=scrUnit1,FILE='scratch1',STATUS='UNKNOWN')

with

OPEN(UNIT=scrUnit1,FILE=scrname1,STATUS='UNKNOWN')

Thanks again,

David

Matthew Mazloff wrote:
> Hi David,
>
> Yeah, that error usually only comes on large adjoint runs....for 
> whatever reason
>
> It means your mpi barriers are not working properly and basically one 
> processor is ahead of another.
> One processor has finished reading a namelist param file (e.g. 
> ini_parms) and wrote the next scratch file...perhaps data.exf or 
> something, while another one is still on iniparms so it goes to read 
> the scratch file thinking its data and its data.exf and it crashes.
>
> I have a hack that makes every processor write its own scratch file 
> and then no issue.  only drawback is it will fill your directory with 
> scratch files
>
> Here's the hack I did that you can use until someone fixes this properly.
>
> First #define TARGET_BGL  in ECCO_CPPOPTIONS.h or I guess CPP_OPTIONS 
> if you dont use ECCO.   or, as you did before, just compile with 
> -DTARGET_CRAYXT
>
> In OPEN_COPY_DATA_FILE add
>
> #include "EESUPPORT.h"
>
>       CHARACTER*(MAX_LEN_FNAM) scrname1
>       CHARACTER*(MAX_LEN_FNAM) scrname2
>
> and then incorperate the CMM( ... ) stuff
>
> #if defined (TARGET_BGL) || defined (TARGET_CRAYXT)
> CMM(
>       WRITE(scrname1,'(3a)') 'scratch',myProcessStr(1:4),'_1'
>       WRITE(scrname2,'(3a)') 'scratch',myProcessStr(1:4),'_2'
> CMM)
>       OPEN(UNIT=scrUnit1,FILE=scrname1,STATUS='UNKNOWN')
>       OPEN(UNIT=scrUnit2,FILE=scrname2,STATUS='UNKNOWN')
> CMM      OPEN(UNIT=scrUnit1,FILE='scratch1',STATUS='UNKNOWN')
> CMM      OPEN(UNIT=scrUnit2,FILE='scratch2',STATUS='UNKNOWN')
> #else
>       OPEN(UNIT=scrUnit1,STATUS='SCRATCH')
>       OPEN(UNIT=scrUnit2,STATUS='SCRATCH')
> #endif
>
>
> You may also find this issue with ini_parms.F and eeset_parms.F so you 
> probably want to incorporate the code there too
>
> I think this is your issue....of course, I could be wrong :-)
> -Matt
>
>
>
> On Jan 22, 2009, at 12:04 PM, David Hebert wrote:
>
>> Hi everyone,
>>
>> I have recently got some time on a Cray XT5. Every now and then I 
>> seem to get this error message I can't seem to figure out...
>>
>> PGFIO/stdio: No such file or directory
>> PGFIO-F-/OPEN/unit=12/error code returned by host stdio - 2.
>> In source file open_copy_data_file.f, at line number 616
>>
>>
>> Is this a namelist reading issue? Has anyone else come across this 
>> issue? It seems the problem is intermittent and not always 
>> replicated. Compiling with -DTARGET_CRAYXT does not seem to fix the 
>> issue. Any help/suggestions are appreciated!
>>
>> Thanks
>>
>> David
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support



More information about the MITgcm-support mailing list