[MITgcm-support] File reading error on XT5
David Hebert
david.hebert.ctr at nrlssc.navy.mil
Thu Jan 22 17:10:07 EST 2009
Hi Matt,
Thanks for the help. It looks like I am to add 2 write statments and 2
open statements? So in the end, are there supposed to be 4 open
statements? Or am I supposed to replace
OPEN(UNIT=scrUnit1,FILE='scratch1',STATUS='UNKNOWN')
with
OPEN(UNIT=scrUnit1,FILE=scrname1,STATUS='UNKNOWN')
Thanks again,
David
Matthew Mazloff wrote:
> Hi David,
>
> Yeah, that error usually only comes on large adjoint runs....for
> whatever reason
>
> It means your mpi barriers are not working properly and basically one
> processor is ahead of another.
> One processor has finished reading a namelist param file (e.g.
> ini_parms) and wrote the next scratch file...perhaps data.exf or
> something, while another one is still on iniparms so it goes to read
> the scratch file thinking its data and its data.exf and it crashes.
>
> I have a hack that makes every processor write its own scratch file
> and then no issue. only drawback is it will fill your directory with
> scratch files
>
> Here's the hack I did that you can use until someone fixes this properly.
>
> First #define TARGET_BGL in ECCO_CPPOPTIONS.h or I guess CPP_OPTIONS
> if you dont use ECCO. or, as you did before, just compile with
> -DTARGET_CRAYXT
>
> In OPEN_COPY_DATA_FILE add
>
> #include "EESUPPORT.h"
>
> CHARACTER*(MAX_LEN_FNAM) scrname1
> CHARACTER*(MAX_LEN_FNAM) scrname2
>
> and then incorperate the CMM( ... ) stuff
>
> #if defined (TARGET_BGL) || defined (TARGET_CRAYXT)
> CMM(
> WRITE(scrname1,'(3a)') 'scratch',myProcessStr(1:4),'_1'
> WRITE(scrname2,'(3a)') 'scratch',myProcessStr(1:4),'_2'
> CMM)
> OPEN(UNIT=scrUnit1,FILE=scrname1,STATUS='UNKNOWN')
> OPEN(UNIT=scrUnit2,FILE=scrname2,STATUS='UNKNOWN')
> CMM OPEN(UNIT=scrUnit1,FILE='scratch1',STATUS='UNKNOWN')
> CMM OPEN(UNIT=scrUnit2,FILE='scratch2',STATUS='UNKNOWN')
> #else
> OPEN(UNIT=scrUnit1,STATUS='SCRATCH')
> OPEN(UNIT=scrUnit2,STATUS='SCRATCH')
> #endif
>
>
> You may also find this issue with ini_parms.F and eeset_parms.F so you
> probably want to incorporate the code there too
>
> I think this is your issue....of course, I could be wrong :-)
> -Matt
>
>
>
> On Jan 22, 2009, at 12:04 PM, David Hebert wrote:
>
>> Hi everyone,
>>
>> I have recently got some time on a Cray XT5. Every now and then I
>> seem to get this error message I can't seem to figure out...
>>
>> PGFIO/stdio: No such file or directory
>> PGFIO-F-/OPEN/unit=12/error code returned by host stdio - 2.
>> In source file open_copy_data_file.f, at line number 616
>>
>>
>> Is this a namelist reading issue? Has anyone else come across this
>> issue? It seems the problem is intermittent and not always
>> replicated. Compiling with -DTARGET_CRAYXT does not seem to fix the
>> issue. Any help/suggestions are appreciated!
>>
>> Thanks
>>
>> David
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
More information about the MITgcm-support
mailing list