[MITgcm-devel] netcdf on sx8

Martin Losch Martin.Losch at awi.de
Thu Nov 27 11:20:47 EST 2008


Following up on my own previous observation:
the error for lab_sea has not gone away, and I still don't know  
exactly what the problem is. But apparently, when mitgcmuv is trying  
to create the file for the second tile, the netcdf library routine  
NF_CREATE returns an error code (12) that translates into "Not enough  
space". I still have no idea why this error should arise. I have  
about 380GB of disk space available. the exact calling statement is  
also completely independent of the size of the problem: err =  
NF_CREATE(fname, NF_CLOBBER, fid). The only input is fname, which a  
character of length 500 (MNC_MAX_PATH).

When I comment out the stop statement in mnc_handle_err, the model  
finishes with many error messages from the mnc-package (mostly  
invalid id) and produces a corrupted netcdf file for each of the  
variables that are saved after the initial problem occurs.

All of this happens for 2 tiles (1 tile is OK obviously, because no  
second file is opened), regardless of doing this on 1 or 2CPU (nSx=2  
or nPx=2).

To me this looks very much like a non-local problem with memory array  
boundaries, but I have no clue why and where this should happen. I  
have tried an array bound check with -eC, but that seemed to be OK.  
Something really fishy ...

Any comments are welcome,

Martin

cc to Jens-Olaf, although he cannot reply to this list.

Oh yes, happy thanksgiving ...

On 30 Jun 2008, at 10:28, Martin Losch wrote:

> Hi all,
>
> I found a funny error with netcdf in my SX8 routine test: in  
> lab_sea/run
> I get this
> > cat STDERR.*
> (PID.TID 0001.0001) *** ERROR *** NetCDF ERROR:les
> (PID.TID 0001.0001) *** ERROR *** MNC ERROR: opening 'phiHydLow. 
> 0000000000.t002.nc'
> > cat STDOUT.0001
>  NetCDF ERROR:
>  ===
>  Not enough space
>  ===
>  MNC ERROR: opening 'phiHydLow.0000000000.t002.nc'
>
> and in ideal_2D_oce
> > cat STDERR.*
> (PID.TID 0001.0001) *** ERROR *** NetCDF ERROR:
> (PID.TID 0001.0001) *** ERROR *** MNC ERROR: opening 'flxDiag. 
> 0000036000.t004.nc'
> > tail STDOUT.0001
>  NetCDF ERROR:
>  ===
>  Not enough space
>  ===
>  MNC ERROR: opening 'flxDiag.0000036000.t004.nc'
>
> phiHydLow ist not part of the diagnostics out and flxDiag.* is only  
> the 4th output stream in data.diagnostics? By lucky accident I  
> found that the second error occurs when the model calls
>> C       Update the record dimension by writing the iteration number
>>         CALL MNC_CW_SET_UDIM(diag_mnc_bn, -1, myThid)
>>         CALL MNC_CW_RL_W_S('D',diag_mnc_bn,0,0,'T',myTime,myThid)   
>> <=======
>>         CALL MNC_CW_SET_UDIM(diag_mnc_bn, 0, myThid)
>>         CALL MNC_CW_I_W_S('I',diag_mnc_bn,0,0,'iter',myIter,myThid)
>>
> from diagnostics_out.F
>
> "not enough space" cannot refer to disks-space, as I am well below  
> my file number and disk-space quotas.
>
> Any idea what could be going on? The other examples with netcdf  
> seem to be doing fine (and in  our "production" runs we generally  
> don't have problems with MITgcm+netcdf  ...)
>
> Martin
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel




More information about the MITgcm-devel mailing list