[MITgcm-support] Re: is this bug in mitgcm code?

Suneet Dwivedi suneetdwivedi at gmail.com
Fri May 9 13:00:39 EDT 2008


Hi Dimitris and Matthew,
Thanks for your reply.
Dimitris, actually I already tried using both global files and
singlecpuio but it didn't work for me. I got the same error message as
before.
Matthew, I shall look at the tar file sent by you, but can you please
suggest me at which part of the code I should make those changes.
probably in mdsio package (I guess); right????
Thanks again,
Suneet

On Fri, May 9, 2008 at 12:41 PM, Matthew Mazloff <mmazloff at mit.edu> wrote:
> Hi Suneet,
>
> You are correct that the current version of the MITgcm does not allow
> useSingleCpuIo for obcs controls.
>
> Patrick, you implemented useSingleCpuIo for the mdsio slice routines awhile
> back for me.  I have also changed them to only write slices for the northern
> boundary, not global files.  I have attached them....the issue with checking
> them in would be one should remove the change i made to the xz code
> (bracketed with CMM's) as this is not robust, and will crash for
> "useSingleCpuIo=.FALSE."
>
> not sure what else may be wrong with the code...but worked for the northern
> boundary
>
> -Matt
>
>
>
>
>
>
> On May 9, 2008, at 12:12 PM, Suneet Dwivedi wrote:
>
>> Hi Everyone,
>> When I tried to run mitgcm adjoint model with obcs package on, I endup
>> with the following error message:
>>
>> -------------------------------------------------------------------------------------------------------------------------------
>> (PID.TID 0000.0001) *** ERROR *** MDSREADFIELD_XZ_GL: File does not exist
>>
>> --------------------------------------------------------------------------------------------------------------------------------
>> My model stopped at:
>>
>> --------------------------------------------------------------------------------------------------------------------------------
>> (PID.TID 0000.0001)  MDSREADFIELD_XZ_GL: opening file:
>> maskobcsn.001.001.data
>> (PID.TID 0000.0001)  MDSREADFIELD_XZ_GL: filename: maskobcsn.002.001.data
>>
>> ----------------------------------------------------------------------------------------------------------------------------------
>> The detailed description of the problem and the solution (that worked
>> for me)  is as follows:
>> (i) Model works fine (stops with 'normal end') when I use single
>> processor/double processor on a single node.
>> (ii) Model crashes with the abovesaid error message when I start using
>> 16 processors on maybe 10 nodes.
>>
>> When I actually started looking for the missing files
>> 'maskobcsn.002.001.data', 'maskobcsn.003.001.data' and so on; I found
>> that the same were written at different nodes than master node even
>> though I used "useSingleCpuIo=.TRUE." I then copied all these files
>> manually to the master node and after that model stopped at different
>> place looking for "adxx_obcsn.0000000000.002.001.data" and then for
>> "xx_obcsn.0000000000.002.001.data" and
>> "pickup_obE.ckptA.002.002.data". So I copied all the files associated
>> with obcs control to the master node and rerun the model and it worked
>> fine this time for me (stopped with normal end). It means that
>> useSingleCpuIo=.TRUE. does not work for obcs control files. I wonder
>> is this a bug in the mitgcm while using obcs alongwith ecco? Am I
>> missing some statement required for obcs package to write output on a
>> single CPU while running the model on multiprocessors? Please help me
>> sort out this problem.
>> Hoping for reply,
>> Cheers,
>> Suneet
>
>
>



More information about the MITgcm-support mailing list