[MITgcm-support] Re: is this bug in mitgcm code?

Matthew Mazloff mmazloff at MIT.EDU
Fri May 9 13:09:27 EDT 2008


Hi Suneet,

You can just put these three files into your code folder.  It may  
work, or let me know what the error is you get....

good luck
-matt


On May 9, 2008, at 1:00 PM, Suneet Dwivedi wrote:

> Hi Dimitris and Matthew,
> Thanks for your reply.
> Dimitris, actually I already tried using both global files and
> singlecpuio but it didn't work for me. I got the same error message as
> before.
> Matthew, I shall look at the tar file sent by you, but can you please
> suggest me at which part of the code I should make those changes.
> probably in mdsio package (I guess); right????
> Thanks again,
> Suneet
>
> On Fri, May 9, 2008 at 12:41 PM, Matthew Mazloff <mmazloff at mit.edu>  
> wrote:
>> Hi Suneet,
>>
>> You are correct that the current version of the MITgcm does not allow
>> useSingleCpuIo for obcs controls.
>>
>> Patrick, you implemented useSingleCpuIo for the mdsio slice  
>> routines awhile
>> back for me.  I have also changed them to only write slices for  
>> the northern
>> boundary, not global files.  I have attached them....the issue  
>> with checking
>> them in would be one should remove the change i made to the xz code
>> (bracketed with CMM's) as this is not robust, and will crash for
>> "useSingleCpuIo=.FALSE."
>>
>> not sure what else may be wrong with the code...but worked for the  
>> northern
>> boundary
>>
>> -Matt
>>
>>
>>
>>
>>
>>
>> On May 9, 2008, at 12:12 PM, Suneet Dwivedi wrote:
>>
>>> Hi Everyone,
>>> When I tried to run mitgcm adjoint model with obcs package on, I  
>>> endup
>>> with the following error message:
>>>
>>> -------------------------------------------------------------------- 
>>> -----------------------------------------------------------
>>> (PID.TID 0000.0001) *** ERROR *** MDSREADFIELD_XZ_GL: File does  
>>> not exist
>>>
>>> -------------------------------------------------------------------- 
>>> ------------------------------------------------------------
>>> My model stopped at:
>>>
>>> -------------------------------------------------------------------- 
>>> ------------------------------------------------------------
>>> (PID.TID 0000.0001)  MDSREADFIELD_XZ_GL: opening file:
>>> maskobcsn.001.001.data
>>> (PID.TID 0000.0001)  MDSREADFIELD_XZ_GL: filename: maskobcsn. 
>>> 002.001.data
>>>
>>> -------------------------------------------------------------------- 
>>> --------------------------------------------------------------
>>> The detailed description of the problem and the solution (that  
>>> worked
>>> for me)  is as follows:
>>> (i) Model works fine (stops with 'normal end') when I use single
>>> processor/double processor on a single node.
>>> (ii) Model crashes with the abovesaid error message when I start  
>>> using
>>> 16 processors on maybe 10 nodes.
>>>
>>> When I actually started looking for the missing files
>>> 'maskobcsn.002.001.data', 'maskobcsn.003.001.data' and so on; I  
>>> found
>>> that the same were written at different nodes than master node even
>>> though I used "useSingleCpuIo=.TRUE." I then copied all these files
>>> manually to the master node and after that model stopped at  
>>> different
>>> place looking for "adxx_obcsn.0000000000.002.001.data" and then for
>>> "xx_obcsn.0000000000.002.001.data" and
>>> "pickup_obE.ckptA.002.002.data". So I copied all the files  
>>> associated
>>> with obcs control to the master node and rerun the model and it  
>>> worked
>>> fine this time for me (stopped with normal end). It means that
>>> useSingleCpuIo=.TRUE. does not work for obcs control files. I wonder
>>> is this a bug in the mitgcm while using obcs alongwith ecco? Am I
>>> missing some statement required for obcs package to write output  
>>> on a
>>> single CPU while running the model on multiprocessors? Please  
>>> help me
>>> sort out this problem.
>>> Hoping for reply,
>>> Cheers,
>>> Suneet
>>
>>
>>




More information about the MITgcm-support mailing list