[MITgcm-devel] mnc and "global" files

Baylor Fox-Kemper baylor at MIT.EDU
Wed Sep 7 21:51:43 EDT 2005


Ed,
   While you're at it, why don't you put a p in front of the processor 
number?

>> pickup.0000001440.p0000.f000001.nc
>> pickup.0000001440.p0001.f000001.nc

Cheers,
   -Baylor

On Sep 7, 2005, at 9:40 PM, Baylor Fox-Kemper wrote:

> Oops,
>   I forgot the really good part of 5)ii):
>
> if state.0000001440.0000.f000001.nc is > 2GB, then you might get
>
>> pickup.0000001440.0000.f000001.nc
>> pickup.0000001440.0001.f000001.nc
>> state.0000001440.0000.f000001.nc
>> state.0000001440.0001.f000001.nc
>> ...
>
>> state.0000002500.0000.f000001.nc
>> state.0000002500.0001.f000001.nc
>> ...
>
>> pickup.0000002880.0000.f000001.nc
>> pickup.0000002880.0001.f000001.nc
>> state.0000002880.0000.f000001.nc
>> state.0000002880.0001.f000001.nc
>> ...
>
> But, the pickup files would still be synched with state.*.nc and other 
> outputs every so often...
>
> One last thing:  If snapshot timing and pickup timing doesn't line up, 
> then err on no repeating data:
>
>
>
>> pickup.0000001440.0000.f000001.nc
>> pickup.0000001440.0001.f000001.nc
>> state.0000001448.0000.f000001.nc
>> state.0000001448.0001.f000001.nc
>> ...
>
>> state.0000002500.0000.f000001.nc
>> state.0000002500.0001.f000001.nc
>> ...
>
>> pickup.0000002880.0000.f000001.nc
>> pickup.0000002880.0001.f000001.nc
>> state.0000002888.0000.f000001.nc
>> state.0000002888.0001.f000001.nc
>> ...
>
> Cheers,
>   -Baylor
>
> On Sep 7, 2005, at 9:35 PM, Baylor Fox-Kemper wrote:
>
>> Hi Ed,
>>   A few points:
>>
>> 1)  The underscore is overkill-- BASENAME.MYITER.fFACENUM.nc or 
>> state.0000000000.f000001.nc suffices.  I hate having to hunt and peck 
>> up in the top of the keyboard...
>>
>> 2)  Whatever naming scheme is chosen, it should be IDENTICAL to the 
>> per-processor files, except the per-processor files should have 
>> another number.  Thus:
>>
>> pickup.0000001440.f000001.nc
>>
>> is a global file, which could be a comprised of the processor outputs:
>>
>> pickup.0000001440.0000.f000001.nc
>> pickup.0000001440.0001.f000001.nc
>> pickup.0000001440.0002.f000001.nc
>> pickup.0000001440.0003.f000001.nc
>>
>> Or, perhaps more clearly, global files should replace the processor 
>> number with a similar length symbol, e.g.,
>>
>> pickup.0000001440.all.f000001.nc
>> or
>> pickup.0000001440.glob.f000001.nc
>>
>> I personally find the latter much easier to pick out of an ls 
>> command, as well as the advantage in easy globbing:
>>
>> ls pickup.*all*.nc
>>
>> or even
>>
>> ls pick*a*.nc
>>
>> 3) The matlab script I wrote should be easily adaptable to converting 
>> back and forth for such files.
>>
>> 4) The myiter is really an improvement.
>>
>> 5) Don't forget our other CRITICAL improvement (which I just spent an 
>> hour figuring out on some old outputs restarted with messy pickups).  
>> We need to synch the output of pickup files with the output of <2GB 
>> requirement!!!  As it currently exists, if one restarts from a pickup 
>> file, there will be a few stragglers left behind in say, state.*.nc, 
>> so that the new state.*.nc has repeated values from the old one.  I 
>> thus recommend either,
>>
>> i) a flag that lets you output a pickup file every time a new 
>> state.*.nc gets created, at the first iteration.
>> or better,
>> ii) a flag that lets you clips the state.*.nc, tave.*.nc, etc every 
>> time a pickup is generated.  So, a big run might produce:
>>
>> pickup.0000001440.0000.f000001.nc
>> pickup.0000001440.0001.f000001.nc
>> state.0000001440.0000.f000001.nc
>> state.0000001440.0001.f000001.nc
>> ...
>>
>> pickup.0000002880.0000.f000001.nc
>> pickup.0000002880.0001.f000001.nc
>> state.0000002880.0000.f000001.nc
>> state.0000002880.0001.f000001.nc
>> ...
>>
>> from which we could form the wonderful (and synchronized) using a 
>> matlab script
>>
>> pickup.0000001440.glob.f000001.nc
>> state.0000001440.glob.f000001.nc
>> ...
>>
>> pickup.0000002880.glob.f000001.nc
>> state.0000002880.glob.f000001.nc
>> ...
>>
>> Then, if I decided I no longer needed the data from 1440 to 2880, I 
>> could just
>>
>> rm state.0000001440.*.nc
>>
>> But, I could easily regenerate it from 
>> pickup.0000001440.glob.f000001.nc, and it wouldn't overlap with the 
>> preceding or following state files.
>>
>> Cheers,
>>    -Baylor
>>
>>
>>
>> On Sep 7, 2005, at 9:06 PM, Ed Hill wrote:
>>
>>>
>>> Hi folks,
>>>
>>> Jean-Michel, Baylor, Daniel, and I recently discussed the lack of
>>> "global" files for mnc/netCDF and have come up with the following 
>>> scheme
>>> which is designed to be very general/flexible:
>>>
>>>   1) For mnc, we won't create a single "global" file like mdsio
>>>      does.  It just doesn't work well for non-cube domains.  Instead,
>>>      we will have a "global" format that is PER FACE since each
>>>      face is logically rectangular, readily maps to netCDF format,
>>>      and can be easily cut up into one or more tiles.
>>>
>>>   2) Given the non-MPI- and non-multithread-writing-safety of
>>>      netCDF v3 we will (at least initially) only support the
>>>      READING of "global" (again, think "per-face") files.  The
>>>      creation of "global" per-face files from collections of
>>>      per-tile files can be done by a post-processing script.  We
>>>      have a MatLAB script that does it.
>>>
>>>   3) The naming scheme that J-M and I propose is:
>>>
>>>         PER FACE:  BASENAME.MYITER.f_FACENUM.nc
>>>           eg:  state.0000000000.f_000001.nc
>>>                phiHydLow.0017280000.f_000003.nc
>>>                dynDiag.0000864000.f_000006.nc
>>>
>>>         PER TILE:  BASENAME.MYITER.t_TILENUM.nc
>>>           eg:  state.0000000000.t_000001.nc
>>>                phiHydLow.0017280000.t_000003.nc
>>>                dynDiag.0000864000.t_000201.nc
>>>
>>>      where:
>>>        BASENAME gives some indication of the type or source
>>>          of the variables within the file
>>>        MYITER is a 10-digit number (much like mdsio) containing
>>>          the model iteration count (myIter) at which the file
>>>          is created.  Typically, files will start at nIter0
>>>          and more files will be created as the netCDF files
>>>          either reach capacity (remember, there is a 2GB file
>>>          size limit on many filesystems so we can only fit a
>>>          finite number of time steps in each file) or reach
>>>          a specified time period (so its easy to create a new
>>>          set of files every month or year or ...).
>>>        FACENUM or TILENUM is, respectively, either a global
>>>          face index (prefaced by "f_") or a global tile index
>>>          (prefaced by "t_").  A MatLAB script will be written
>>>          to spatially assemble tile files into "global" per-
>>>          face files.
>>>
>>>   4) A new flag or flags (and probably some logic) will be added
>>>      to allow the specification of how much time or how many
>>>      model iterations should pass before a new set of netCDF
>>>      files are created.  New files will use the then-current
>>>      myIter value for their names so that the correct file
>>>      sequence is easily recognized.
>>>
>>> So, does anyone have any vetoes or suggestions for improvement?
>>>
>>> Ed
>>>
>>>
>>> -- 
>>> Edward H. Hill III, PhD
>>> office:  MIT Dept. of EAPS;  Rm 54-1424;  77 Massachusetts Ave.
>>>              Cambridge, MA 02139-4307
>>> emails:  eh3 at mit.edu                ed at eh3.com
>>> URLs:    http://web.mit.edu/eh3/    http://eh3.com/
>>> phone:   617-253-0098
>>> fax:     617-253-4464
>>>
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>




More information about the MITgcm-devel mailing list