[MITgcm-support] Re: is this bug in mitgcm code?

Matthew Mazloff mmazloff at MIT.EDU
Fri May 9 12:41:58 EDT 2008


Hi Suneet,

You are correct that the current version of the MITgcm does not allow  
useSingleCpuIo for obcs controls.

Patrick, you implemented useSingleCpuIo for the mdsio slice routines  
awhile back for me.  I have also changed them to only write slices  
for the northern boundary, not global files.  I have attached  
them....the issue with checking them in would be one should remove  
the change i made to the xz code (bracketed with CMM's) as this is  
not robust, and will crash for "useSingleCpuIo=.FALSE."

not sure what else may be wrong with the code...but worked for the  
northern boundary

-Matt


-------------- next part --------------
A non-text attachment was scrubbed...
Name: mdsio_slice_routines.tar
Type: application/x-tar
Size: 153600 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20080509/3c95d22b/attachment.tar>
-------------- next part --------------



On May 9, 2008, at 12:12 PM, Suneet Dwivedi wrote:

> Hi Everyone,
> When I tried to run mitgcm adjoint model with obcs package on, I endup
> with the following error message:
> ---------------------------------------------------------------------- 
> ---------------------------------------------------------
> (PID.TID 0000.0001) *** ERROR *** MDSREADFIELD_XZ_GL: File does not  
> exist
> ---------------------------------------------------------------------- 
> ----------------------------------------------------------
> My model stopped at:
> ---------------------------------------------------------------------- 
> ----------------------------------------------------------
> (PID.TID 0000.0001)  MDSREADFIELD_XZ_GL: opening file: maskobcsn. 
> 001.001.data
> (PID.TID 0000.0001)  MDSREADFIELD_XZ_GL: filename: maskobcsn. 
> 002.001.data
> ---------------------------------------------------------------------- 
> ------------------------------------------------------------
> The detailed description of the problem and the solution (that worked
> for me)  is as follows:
> (i) Model works fine (stops with 'normal end') when I use single
> processor/double processor on a single node.
> (ii) Model crashes with the abovesaid error message when I start using
> 16 processors on maybe 10 nodes.
>
> When I actually started looking for the missing files
> 'maskobcsn.002.001.data', 'maskobcsn.003.001.data' and so on; I found
> that the same were written at different nodes than master node even
> though I used "useSingleCpuIo=.TRUE." I then copied all these files
> manually to the master node and after that model stopped at different
> place looking for "adxx_obcsn.0000000000.002.001.data" and then for
> "xx_obcsn.0000000000.002.001.data" and
> "pickup_obE.ckptA.002.002.data". So I copied all the files associated
> with obcs control to the master node and rerun the model and it worked
> fine this time for me (stopped with normal end). It means that
> useSingleCpuIo=.TRUE. does not work for obcs control files. I wonder
> is this a bug in the mitgcm while using obcs alongwith ecco? Am I
> missing some statement required for obcs package to write output on a
> single CPU while running the model on multiprocessors? Please help me
> sort out this problem.
> Hoping for reply,
> Cheers,
> Suneet



More information about the MITgcm-support mailing list