[MITgcm-devel] netcdf on sx8
Martin Losch
Martin.Losch at awi.de
Thu Nov 27 11:20:47 EST 2008
Following up on my own previous observation:
the error for lab_sea has not gone away, and I still don't know
exactly what the problem is. But apparently, when mitgcmuv is trying
to create the file for the second tile, the netcdf library routine
NF_CREATE returns an error code (12) that translates into "Not enough
space". I still have no idea why this error should arise. I have
about 380GB of disk space available. the exact calling statement is
also completely independent of the size of the problem: err =
NF_CREATE(fname, NF_CLOBBER, fid). The only input is fname, which a
character of length 500 (MNC_MAX_PATH).
When I comment out the stop statement in mnc_handle_err, the model
finishes with many error messages from the mnc-package (mostly
invalid id) and produces a corrupted netcdf file for each of the
variables that are saved after the initial problem occurs.
All of this happens for 2 tiles (1 tile is OK obviously, because no
second file is opened), regardless of doing this on 1 or 2CPU (nSx=2
or nPx=2).
To me this looks very much like a non-local problem with memory array
boundaries, but I have no clue why and where this should happen. I
have tried an array bound check with -eC, but that seemed to be OK.
Something really fishy ...
Any comments are welcome,
Martin
cc to Jens-Olaf, although he cannot reply to this list.
Oh yes, happy thanksgiving ...
On 30 Jun 2008, at 10:28, Martin Losch wrote:
> Hi all,
>
> I found a funny error with netcdf in my SX8 routine test: in
> lab_sea/run
> I get this
> > cat STDERR.*
> (PID.TID 0001.0001) *** ERROR *** NetCDF ERROR:les
> (PID.TID 0001.0001) *** ERROR *** MNC ERROR: opening 'phiHydLow.
> 0000000000.t002.nc'
> > cat STDOUT.0001
> NetCDF ERROR:
> ===
> Not enough space
> ===
> MNC ERROR: opening 'phiHydLow.0000000000.t002.nc'
>
> and in ideal_2D_oce
> > cat STDERR.*
> (PID.TID 0001.0001) *** ERROR *** NetCDF ERROR:
> (PID.TID 0001.0001) *** ERROR *** MNC ERROR: opening 'flxDiag.
> 0000036000.t004.nc'
> > tail STDOUT.0001
> NetCDF ERROR:
> ===
> Not enough space
> ===
> MNC ERROR: opening 'flxDiag.0000036000.t004.nc'
>
> phiHydLow ist not part of the diagnostics out and flxDiag.* is only
> the 4th output stream in data.diagnostics? By lucky accident I
> found that the second error occurs when the model calls
>> C Update the record dimension by writing the iteration number
>> CALL MNC_CW_SET_UDIM(diag_mnc_bn, -1, myThid)
>> CALL MNC_CW_RL_W_S('D',diag_mnc_bn,0,0,'T',myTime,myThid)
>> <=======
>> CALL MNC_CW_SET_UDIM(diag_mnc_bn, 0, myThid)
>> CALL MNC_CW_I_W_S('I',diag_mnc_bn,0,0,'iter',myIter,myThid)
>>
> from diagnostics_out.F
>
> "not enough space" cannot refer to disks-space, as I am well below
> my file number and disk-space quotas.
>
> Any idea what could be going on? The other examples with netcdf
> seem to be doing fine (and in our "production" runs we generally
> don't have problems with MITgcm+netcdf ...)
>
> Martin
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list