[MITgcm-support] File reading error on XT5
david.hebert.ctr at nrlssc.navy.mil
Thu Jan 22 17:10:07 EST 2009
Thanks for the help. It looks like I am to add 2 write statments and 2
open statements? So in the end, are there supposed to be 4 open
statements? Or am I supposed to replace
Matthew Mazloff wrote:
> Hi David,
> Yeah, that error usually only comes on large adjoint runs....for
> whatever reason
> It means your mpi barriers are not working properly and basically one
> processor is ahead of another.
> One processor has finished reading a namelist param file (e.g.
> ini_parms) and wrote the next scratch file...perhaps data.exf or
> something, while another one is still on iniparms so it goes to read
> the scratch file thinking its data and its data.exf and it crashes.
> I have a hack that makes every processor write its own scratch file
> and then no issue. only drawback is it will fill your directory with
> scratch files
> Here's the hack I did that you can use until someone fixes this properly.
> First #define TARGET_BGL in ECCO_CPPOPTIONS.h or I guess CPP_OPTIONS
> if you dont use ECCO. or, as you did before, just compile with
> In OPEN_COPY_DATA_FILE add
> #include "EESUPPORT.h"
> CHARACTER*(MAX_LEN_FNAM) scrname1
> CHARACTER*(MAX_LEN_FNAM) scrname2
> and then incorperate the CMM( ... ) stuff
> #if defined (TARGET_BGL) || defined (TARGET_CRAYXT)
> WRITE(scrname1,'(3a)') 'scratch',myProcessStr(1:4),'_1'
> WRITE(scrname2,'(3a)') 'scratch',myProcessStr(1:4),'_2'
> CMM OPEN(UNIT=scrUnit1,FILE='scratch1',STATUS='UNKNOWN')
> CMM OPEN(UNIT=scrUnit2,FILE='scratch2',STATUS='UNKNOWN')
> You may also find this issue with ini_parms.F and eeset_parms.F so you
> probably want to incorporate the code there too
> I think this is your issue....of course, I could be wrong :-)
> On Jan 22, 2009, at 12:04 PM, David Hebert wrote:
>> Hi everyone,
>> I have recently got some time on a Cray XT5. Every now and then I
>> seem to get this error message I can't seem to figure out...
>> PGFIO/stdio: No such file or directory
>> PGFIO-F-/OPEN/unit=12/error code returned by host stdio - 2.
>> In source file open_copy_data_file.f, at line number 616
>> Is this a namelist reading issue? Has anyone else come across this
>> issue? It seems the problem is intermittent and not always
>> replicated. Compiling with -DTARGET_CRAYXT does not seem to fix the
>> issue. Any help/suggestions are appreciated!
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
More information about the MITgcm-support