[MITgcm-support] Data formats and archiving hints

Mon Aug 3 15:38:36 EDT 2009

Hi All,

Thanks a lot for your comments and suggestions.

I'll give netcdf a try in the next few days here though I am little  
confused how well it will scale.  My grids are Cartesian (and 2-D - I  
do internal wave modelling, much in collaboration with Sonya Legg).   
For instance my current run will be 40 Gb over 16 tiles.

On 3-Aug-09, at 6:16 AM, Jean-Michel Campin wrote:

> Regarding the time, as Christopher wrote, it's in the meta file
> but it's currently ignored by rdmds. I guess we could change it
> if this is found to be useful.

Is this new?  I see timeStepNumber, which I already know from the  
filename.  But I need to know dt*timeStepNumber.

Just so you don't think I'm a numbskull, I sometimes have runs with  
one dt, and higher-res runs at a lower dt.  At times, (rarely), there  
are enough different dt's that writing processing scripts would be a  
heck of a lot easier if there were an automated way to figure out what  
dt I'd used without trying to find the original "data" file and coding  
in the dt by hand depending on the run I am analyzing.   And sometimes  
I have been so sloppy I overwrote the original "data" file and have  
had to infer the dt from the timeStepNumber and the actual results.   
It seems the physical time the model thinks it is should be stored  
somewhere.

>> On really large integrations (order 500x500 surface nodes and  
>> more), I
>> do not use NetCDF, because you run quickly into a netcdf-file size
>> limitation of 2GB (the MITgcm netcdf interface can handle that by
>> opening new files, once you reach this limit, but it beats some of  
>> the
>> purpose of netcdf). This limitation has been lifted with more recent
>> versions of netcdf (3.6 I think), but only to 4GB, as far as I know.
>> When you deal with really large files (order 1GB for 1 3D-field)  
>> netcdf
>> becomes pretty useless as far as I am concerned.

Can the netCDF file size be tailored to a certain number of time- 
dumps?  I deal with tidal integrations, so a file every couple of  
tidal cycles would be ideal.  I didn't see any parameter in the (old)  
docs I have, just a warning to reduce write intervals or restart from  
pickups (that sounds like fun).  If this cannot be done, please  
consider this a vote for such a feature!

My other hesitation is that I modified pkg/pp81 to directly write out  
mds files, and I guess I'd have to incorporate that into the  
diagnostics instead (where it really should be I suppose).  (Not that  
I use pp81, I just cannibalized the code for my own mixing scheme).

>>
>> Your other problem: I am afraid, that there is no automatism. You'll
>> need to document your runs yourself, painfully boring as it is (I  
>> create
>> hand-written tables with notes, comments, parameter values on it,  
>> that
>> end up in folders, that I can never find when I need them).

Good to know I am not alone!  I've found a 4.5 Tb NAS has made life  
much easier for my humble needs.  But I know I have Gbs of wasted  
model output that I know will never be part of a paper, but I'm just  
not sure *which* model output.  My solution for now is to organize as  
follows:

ProjectName/
    input/
    code/
    build/
    results/
        Result1VeryDescriptive/
             <all the data>
             _Model/input
             _Model/code
             _Model/build
        Result2VeryDescriptive/
           ....

where the _Model/ directory is a snapshot of the ProjectName/  
directory (w/o the results) when the model was run.  Theoretically I  
could erase <all the data> for any Result and still re-make the Result  
at a later date if I realized I actually need that run.

I suppose I could also do:

ProjectName/
    Result1VeryDescriptive/
       input/
       code/
       build/
       results/
           <all the data>
     Result2VeryDescriptive/
        ....
but that doesn't allow me to template a bunch of similar runs as easily.

Thanks again for all your thoughts...

Cheers,  Jody

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20090803/2ea9f706/attachment.htm>