[MITgcm-support] is this bug in mitgcm code?

Suneet Dwivedi suneetdwivedi at gmail.com
Fri May 9 12:12:26 EDT 2008


Hi Everyone,
When I tried to run mitgcm adjoint model with obcs package on, I endup
with the following error message:
-------------------------------------------------------------------------------------------------------------------------------
(PID.TID 0000.0001) *** ERROR *** MDSREADFIELD_XZ_GL: File does not exist
--------------------------------------------------------------------------------------------------------------------------------
My model stopped at:
--------------------------------------------------------------------------------------------------------------------------------
(PID.TID 0000.0001)  MDSREADFIELD_XZ_GL: opening file: maskobcsn.001.001.data
(PID.TID 0000.0001)  MDSREADFIELD_XZ_GL: filename: maskobcsn.002.001.data
----------------------------------------------------------------------------------------------------------------------------------
The detailed description of the problem and the solution (that worked
for me)  is as follows:
(i) Model works fine (stops with 'normal end') when I use single
processor/double processor on a single node.
(ii) Model crashes with the abovesaid error message when I start using
16 processors on maybe 10 nodes.

When I actually started looking for the missing files
'maskobcsn.002.001.data', 'maskobcsn.003.001.data' and so on; I found
that the same were written at different nodes than master node even
though I used "useSingleCpuIo=.TRUE." I then copied all these files
manually to the master node and after that model stopped at different
place looking for "adxx_obcsn.0000000000.002.001.data" and then for
"xx_obcsn.0000000000.002.001.data" and
"pickup_obE.ckptA.002.002.data". So I copied all the files associated
with obcs control to the master node and rerun the model and it worked
fine this time for me (stopped with normal end). It means that
useSingleCpuIo=.TRUE. does not work for obcs control files. I wonder
is this a bug in the mitgcm while using obcs alongwith ecco? Am I
missing some statement required for obcs package to write output on a
single CPU while running the model on multiprocessors? Please help me
sort out this problem.
Hoping for reply,
Cheers,
Suneet



More information about the MITgcm-support mailing list