[MITgcm-support] Data formats and archiving hints

Holly Dail hdail at MIT.EDU
Mon Aug 3 17:23:34 EDT 2009


Hi Jody -

I too lose track of model configurations / data / etc after a while.   
I'm a newbie, but thought I'd still send my 2 cents.

I try to update my code base regularly (preferably once/month) and  
build fresh so that I keep up with bug fixes and remember how  
everything is cobbled together at least that often.  Then, I organize  
by the month of the build, the build type, and then the run.  For  
example:
data_storage/2009_05_mitGCM/2x2nAtlAdj_optim/run200905_019
Of course I need to stop updating for a given project  once I need a  
bunch of consistent runs.

I keep a text file open all the time where I log details for each run  
(where it was run, changes made, how it went, etc); when I do a new  
update/build I start a new file.  When I'm working on multiple  
projects at once I still sequence my runs this way so that there is  
only every 1 run200905_019 (for example) on any of the disks I work on  
for the runs and analysis.  This has been working moderately well,  
though there are still plenty of kinks and I still lose track.  I need  
to find a better system for storing the model configuration for each  
run as you do; I tend to use a lot of soft links to save disk space,  
but they can sometimes be pretty useless for reconstructing what went  
into a run.  I do like being able to grep the run log files and when I  
need to clean up disk space the monthly organization and logs are  
helpful.  Come to think of it, 4.5 Tb is a handy solution ;-)

Holly




> Good to know I am not alone!  I've found a 4.5 Tb NAS has made life  
> much easier for my humble needs.  But I know I have Gbs of wasted  
> model output that I know will never be part of a paper, but I'm just  
> not sure *which* model output.  My solution for now is to organize  
> as follows:
>
> ProjectName/
>    input/
>    code/
>    build/
>    results/
>        Result1VeryDescriptive/
>             <all the data>
>             _Model/input
>             _Model/code
>             _Model/build
>        Result2VeryDescriptive/
>           ....
>
> where the _Model/ directory is a snapshot of the ProjectName/  
> directory (w/o the results) when the model was run.  Theoretically I  
> could erase <all the data> for any Result and still re-make the  
> Result at a later date if I realized I actually need that run.
>
> I suppose I could also do:
>
> ProjectName/
>    Result1VeryDescriptive/
>       input/
>       code/
>       build/
>       results/
>           <all the data>
>     Result2VeryDescriptive/
>        ....
> but that doesn't allow me to template a bunch of similar runs as  
> easily.
>
> Thanks again for all your thoughts...
>
> Cheers,  Jody
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support




More information about the MITgcm-support mailing list