[MITgcm-support] File reading error on XT5
Matthew Mazloff
mmazloff at MIT.EDU
Thu Jan 22 15:40:17 EST 2009
Hi David,
Yeah, that error usually only comes on large adjoint runs....for
whatever reason
It means your mpi barriers are not working properly and basically one
processor is ahead of another.
One processor has finished reading a namelist param file (e.g.
ini_parms) and wrote the next scratch file...perhaps data.exf or
something, while another one is still on iniparms so it goes to read
the scratch file thinking its data and its data.exf and it crashes.
I have a hack that makes every processor write its own scratch file
and then no issue. only drawback is it will fill your directory with
scratch files
Here's the hack I did that you can use until someone fixes this
properly.
First #define TARGET_BGL in ECCO_CPPOPTIONS.h or I guess CPP_OPTIONS
if you dont use ECCO. or, as you did before, just compile with -
DTARGET_CRAYXT
In OPEN_COPY_DATA_FILE add
#include "EESUPPORT.h"
CHARACTER*(MAX_LEN_FNAM) scrname1
CHARACTER*(MAX_LEN_FNAM) scrname2
and then incorperate the CMM( ... ) stuff
#if defined (TARGET_BGL) || defined (TARGET_CRAYXT)
CMM(
WRITE(scrname1,'(3a)') 'scratch',myProcessStr(1:4),'_1'
WRITE(scrname2,'(3a)') 'scratch',myProcessStr(1:4),'_2'
CMM)
OPEN(UNIT=scrUnit1,FILE=scrname1,STATUS='UNKNOWN')
OPEN(UNIT=scrUnit2,FILE=scrname2,STATUS='UNKNOWN')
CMM OPEN(UNIT=scrUnit1,FILE='scratch1',STATUS='UNKNOWN')
CMM OPEN(UNIT=scrUnit2,FILE='scratch2',STATUS='UNKNOWN')
#else
OPEN(UNIT=scrUnit1,STATUS='SCRATCH')
OPEN(UNIT=scrUnit2,STATUS='SCRATCH')
#endif
You may also find this issue with ini_parms.F and eeset_parms.F so
you probably want to incorporate the code there too
I think this is your issue....of course, I could be wrong :-)
-Matt
On Jan 22, 2009, at 12:04 PM, David Hebert wrote:
> Hi everyone,
>
> I have recently got some time on a Cray XT5. Every now and then I
> seem to get this error message I can't seem to figure out...
>
> PGFIO/stdio: No such file or directory
> PGFIO-F-/OPEN/unit=12/error code returned by host stdio - 2.
> In source file open_copy_data_file.f, at line number 616
>
>
> Is this a namelist reading issue? Has anyone else come across this
> issue? It seems the problem is intermittent and not always
> replicated. Compiling with -DTARGET_CRAYXT does not seem to fix the
> issue. Any help/suggestions are appreciated!
>
> Thanks
>
> David
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
More information about the MITgcm-support
mailing list