[MITgcm-devel] mnc and "global" files
Baylor Fox-Kemper
baylor at MIT.EDU
Wed Sep 7 21:40:30 EDT 2005
Oops,
I forgot the really good part of 5)ii):
if state.0000001440.0000.f000001.nc is > 2GB, then you might get
> pickup.0000001440.0000.f000001.nc
> pickup.0000001440.0001.f000001.nc
> state.0000001440.0000.f000001.nc
> state.0000001440.0001.f000001.nc
> ...
> state.0000002500.0000.f000001.nc
> state.0000002500.0001.f000001.nc
> ...
> pickup.0000002880.0000.f000001.nc
> pickup.0000002880.0001.f000001.nc
> state.0000002880.0000.f000001.nc
> state.0000002880.0001.f000001.nc
> ...
But, the pickup files would still be synched with state.*.nc and other
outputs every so often...
One last thing: If snapshot timing and pickup timing doesn't line up,
then err on no repeating data:
> pickup.0000001440.0000.f000001.nc
> pickup.0000001440.0001.f000001.nc
> state.0000001448.0000.f000001.nc
> state.0000001448.0001.f000001.nc
> ...
> state.0000002500.0000.f000001.nc
> state.0000002500.0001.f000001.nc
> ...
> pickup.0000002880.0000.f000001.nc
> pickup.0000002880.0001.f000001.nc
> state.0000002888.0000.f000001.nc
> state.0000002888.0001.f000001.nc
> ...
Cheers,
-Baylor
On Sep 7, 2005, at 9:35 PM, Baylor Fox-Kemper wrote:
> Hi Ed,
> A few points:
>
> 1) The underscore is overkill-- BASENAME.MYITER.fFACENUM.nc or
> state.0000000000.f000001.nc suffices. I hate having to hunt and peck
> up in the top of the keyboard...
>
> 2) Whatever naming scheme is chosen, it should be IDENTICAL to the
> per-processor files, except the per-processor files should have
> another number. Thus:
>
> pickup.0000001440.f000001.nc
>
> is a global file, which could be a comprised of the processor outputs:
>
> pickup.0000001440.0000.f000001.nc
> pickup.0000001440.0001.f000001.nc
> pickup.0000001440.0002.f000001.nc
> pickup.0000001440.0003.f000001.nc
>
> Or, perhaps more clearly, global files should replace the processor
> number with a similar length symbol, e.g.,
>
> pickup.0000001440.all.f000001.nc
> or
> pickup.0000001440.glob.f000001.nc
>
> I personally find the latter much easier to pick out of an ls command,
> as well as the advantage in easy globbing:
>
> ls pickup.*all*.nc
>
> or even
>
> ls pick*a*.nc
>
> 3) The matlab script I wrote should be easily adaptable to converting
> back and forth for such files.
>
> 4) The myiter is really an improvement.
>
> 5) Don't forget our other CRITICAL improvement (which I just spent an
> hour figuring out on some old outputs restarted with messy pickups).
> We need to synch the output of pickup files with the output of <2GB
> requirement!!! As it currently exists, if one restarts from a pickup
> file, there will be a few stragglers left behind in say, state.*.nc,
> so that the new state.*.nc has repeated values from the old one. I
> thus recommend either,
>
> i) a flag that lets you output a pickup file every time a new
> state.*.nc gets created, at the first iteration.
> or better,
> ii) a flag that lets you clips the state.*.nc, tave.*.nc, etc every
> time a pickup is generated. So, a big run might produce:
>
> pickup.0000001440.0000.f000001.nc
> pickup.0000001440.0001.f000001.nc
> state.0000001440.0000.f000001.nc
> state.0000001440.0001.f000001.nc
> ...
>
> pickup.0000002880.0000.f000001.nc
> pickup.0000002880.0001.f000001.nc
> state.0000002880.0000.f000001.nc
> state.0000002880.0001.f000001.nc
> ...
>
> from which we could form the wonderful (and synchronized) using a
> matlab script
>
> pickup.0000001440.glob.f000001.nc
> state.0000001440.glob.f000001.nc
> ...
>
> pickup.0000002880.glob.f000001.nc
> state.0000002880.glob.f000001.nc
> ...
>
> Then, if I decided I no longer needed the data from 1440 to 2880, I
> could just
>
> rm state.0000001440.*.nc
>
> But, I could easily regenerate it from
> pickup.0000001440.glob.f000001.nc, and it wouldn't overlap with the
> preceding or following state files.
>
> Cheers,
> -Baylor
>
>
>
> On Sep 7, 2005, at 9:06 PM, Ed Hill wrote:
>
>>
>> Hi folks,
>>
>> Jean-Michel, Baylor, Daniel, and I recently discussed the lack of
>> "global" files for mnc/netCDF and have come up with the following
>> scheme
>> which is designed to be very general/flexible:
>>
>> 1) For mnc, we won't create a single "global" file like mdsio
>> does. It just doesn't work well for non-cube domains. Instead,
>> we will have a "global" format that is PER FACE since each
>> face is logically rectangular, readily maps to netCDF format,
>> and can be easily cut up into one or more tiles.
>>
>> 2) Given the non-MPI- and non-multithread-writing-safety of
>> netCDF v3 we will (at least initially) only support the
>> READING of "global" (again, think "per-face") files. The
>> creation of "global" per-face files from collections of
>> per-tile files can be done by a post-processing script. We
>> have a MatLAB script that does it.
>>
>> 3) The naming scheme that J-M and I propose is:
>>
>> PER FACE: BASENAME.MYITER.f_FACENUM.nc
>> eg: state.0000000000.f_000001.nc
>> phiHydLow.0017280000.f_000003.nc
>> dynDiag.0000864000.f_000006.nc
>>
>> PER TILE: BASENAME.MYITER.t_TILENUM.nc
>> eg: state.0000000000.t_000001.nc
>> phiHydLow.0017280000.t_000003.nc
>> dynDiag.0000864000.t_000201.nc
>>
>> where:
>> BASENAME gives some indication of the type or source
>> of the variables within the file
>> MYITER is a 10-digit number (much like mdsio) containing
>> the model iteration count (myIter) at which the file
>> is created. Typically, files will start at nIter0
>> and more files will be created as the netCDF files
>> either reach capacity (remember, there is a 2GB file
>> size limit on many filesystems so we can only fit a
>> finite number of time steps in each file) or reach
>> a specified time period (so its easy to create a new
>> set of files every month or year or ...).
>> FACENUM or TILENUM is, respectively, either a global
>> face index (prefaced by "f_") or a global tile index
>> (prefaced by "t_"). A MatLAB script will be written
>> to spatially assemble tile files into "global" per-
>> face files.
>>
>> 4) A new flag or flags (and probably some logic) will be added
>> to allow the specification of how much time or how many
>> model iterations should pass before a new set of netCDF
>> files are created. New files will use the then-current
>> myIter value for their names so that the correct file
>> sequence is easily recognized.
>>
>> So, does anyone have any vetoes or suggestions for improvement?
>>
>> Ed
>>
>>
>> --
>> Edward H. Hill III, PhD
>> office: MIT Dept. of EAPS; Rm 54-1424; 77 Massachusetts Ave.
>> Cambridge, MA 02139-4307
>> emails: eh3 at mit.edu ed at eh3.com
>> URLs: http://web.mit.edu/eh3/ http://eh3.com/
>> phone: 617-253-0098
>> fax: 617-253-4464
>>
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list