[MITgcm-support] Data formats and archiving hints
Klymak Jody
jklymak at uvic.ca
Mon Aug 3 15:38:36 EDT 2009
Hi All,
Thanks a lot for your comments and suggestions.
I'll give netcdf a try in the next few days here though I am little
confused how well it will scale. My grids are Cartesian (and 2-D - I
do internal wave modelling, much in collaboration with Sonya Legg).
For instance my current run will be 40 Gb over 16 tiles.
On 3-Aug-09, at 6:16 AM, Jean-Michel Campin wrote:
> Regarding the time, as Christopher wrote, it's in the meta file
> but it's currently ignored by rdmds. I guess we could change it
> if this is found to be useful.
Is this new? I see timeStepNumber, which I already know from the
filename. But I need to know dt*timeStepNumber.
Just so you don't think I'm a numbskull, I sometimes have runs with
one dt, and higher-res runs at a lower dt. At times, (rarely), there
are enough different dt's that writing processing scripts would be a
heck of a lot easier if there were an automated way to figure out what
dt I'd used without trying to find the original "data" file and coding
in the dt by hand depending on the run I am analyzing. And sometimes
I have been so sloppy I overwrote the original "data" file and have
had to infer the dt from the timeStepNumber and the actual results.
It seems the physical time the model thinks it is should be stored
somewhere.
>> On really large integrations (order 500x500 surface nodes and
>> more), I
>> do not use NetCDF, because you run quickly into a netcdf-file size
>> limitation of 2GB (the MITgcm netcdf interface can handle that by
>> opening new files, once you reach this limit, but it beats some of
>> the
>> purpose of netcdf). This limitation has been lifted with more recent
>> versions of netcdf (3.6 I think), but only to 4GB, as far as I know.
>> When you deal with really large files (order 1GB for 1 3D-field)
>> netcdf
>> becomes pretty useless as far as I am concerned.
Can the netCDF file size be tailored to a certain number of time-
dumps? I deal with tidal integrations, so a file every couple of
tidal cycles would be ideal. I didn't see any parameter in the (old)
docs I have, just a warning to reduce write intervals or restart from
pickups (that sounds like fun). If this cannot be done, please
consider this a vote for such a feature!
My other hesitation is that I modified pkg/pp81 to directly write out
mds files, and I guess I'd have to incorporate that into the
diagnostics instead (where it really should be I suppose). (Not that
I use pp81, I just cannibalized the code for my own mixing scheme).
>>
>> Your other problem: I am afraid, that there is no automatism. You'll
>> need to document your runs yourself, painfully boring as it is (I
>> create
>> hand-written tables with notes, comments, parameter values on it,
>> that
>> end up in folders, that I can never find when I need them).
Good to know I am not alone! I've found a 4.5 Tb NAS has made life
much easier for my humble needs. But I know I have Gbs of wasted
model output that I know will never be part of a paper, but I'm just
not sure *which* model output. My solution for now is to organize as
follows:
ProjectName/
input/
code/
build/
results/
Result1VeryDescriptive/
<all the data>
_Model/input
_Model/code
_Model/build
Result2VeryDescriptive/
....
where the _Model/ directory is a snapshot of the ProjectName/
directory (w/o the results) when the model was run. Theoretically I
could erase <all the data> for any Result and still re-make the Result
at a later date if I realized I actually need that run.
I suppose I could also do:
ProjectName/
Result1VeryDescriptive/
input/
code/
build/
results/
<all the data>
Result2VeryDescriptive/
....
but that doesn't allow me to template a bunch of similar runs as easily.
Thanks again for all your thoughts...
Cheers, Jody
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20090803/2ea9f706/attachment.htm>
More information about the MITgcm-support
mailing list