[MITgcm-support] Data formats and archiving hints

Ryan Abernathey rpa at MIT.EDU
Sun Aug 2 22:45:38 EDT 2009


Hi Jody,

You may have gathered from my recent posts to this list that I have  
been wrestling with the same question. I started out several years ago  
using MDS files but have switched to NetCDF for my latest project. I  
have concluded that NetCDF is much better for several reasons:

1) NetCDF files are not stored in memory by MATLAB. With MDS files, I  
often ran up against MATLAB's memory limitations when dealing with  
large 64-bit 3D data files. This is not an issue using NetCDF, as the  
data is read directly from the filesystem only when it is needed.  
Also, there is no "load time" when instantiating a NetCDF file in  
MATLAB--it happens instantly. This is far superior to how MDS files  
are handled, and consequently there is no limit on the size of NetCDF  
files.

2) Grid and coordinate information is embedded in the NetCDF files,  
along with units, descriptions, and time information (i.e. metadata).  
This means that you don't need to keep referencing the manual to  
figure out the precise spatial coordinates for each of your  
diagnostics. Very useful.

3) The output from all timesteps is condensed into one file. Combined  
with the ability to output different diagnostics into the same file  
(using the diagnostics package), this means you can potentially store  
all of the output you wish to analyze from a particular run in one  
single file. I suspect this would solve all your organizational  
problems.

However, there is one major disadvantage, especially for large runs.

* The globalFiles or useSingleCpuIO options do not work with NetCDF  
output. Each tile writes its own file. So when your run is done you  
have to use a script called gluemnc (available as a MATLAB or shell  
script) to join together the different tiles into one global netCDF  
file. (Your post gave the impression you aren't currently using this  
option, so this extra step probably won't seem like a big deal anyway.)

Overall I would definitely recommend switching to netCDF. The long- 
term benefits will outweigh the temporary pain.

Hope this helps!

-Ryan

p.s. Many people apparently prefer to keep using MDS pickup files, but  
that is a different thread...


On Aug 2, 2009, at 2:48 PM, Klymak Jody wrote:

>
> Hi all,
>
> As an amateur numerical modeller using the MITgcm I thought I'd ask  
> for folks' data format and archiving ideas/advice.
>
> I do my analysis in Matlab, and am unlikely to change that.  I've  
> been writing the bare binary files (mds?) and reading those in fine  
> with the matlab rdmds.m function.  It works very well, and I  
> appreciate the effort that went into it.
>
> However, as I get to larger simulations (ahem, larger for me means  
> 16 or 32 tiles instead of 4 or 8), I start to wonder about the  
> thousands of tile files on my machine, and if that is really the  
> most efficient way for me to be storing my data.  So:
>
> Is there an inherent advantage to switching to netcdf?
>
> To be honest I'm not sure what files are produced from the netcdf  
> output - it looks like they are per-tile, and monolithic in that one  
> file contains the whole run for that tile?  If correct,  how fast  
> are they to read in matlab?  I'm running a simulation that will  
> reach 3Gb/tile.
>
> Is there more meta information?  I am always flumoxed that there is  
> no "time" in the MDS meta files, so I have to figure out what dt was  
> for my run and multiply by iteration number.
>
> Parallel discussion:  How do folks organize and keep track of their  
> model runs?  I have a large number now, and quite frankly I forget  
> which ones are trash, and which ones I am using for my latest  
> paper.   Sure, I have to be more organized, but rather than invent  
> the wheel, I'd love to hear how folks who have been doing this for a  
> while keep track.  Being lazy, automagic methods are always  
> appreciated...
>
> Thanks for any thoughts folks feel like sharing...
>
> Cheers,  Jody
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support




More information about the MITgcm-support mailing list