[MITgcm-support] Data formats and archiving hints
Holly Dail
hdail at MIT.EDU
Mon Aug 3 17:23:34 EDT 2009
Hi Jody -
I too lose track of model configurations / data / etc after a while.
I'm a newbie, but thought I'd still send my 2 cents.
I try to update my code base regularly (preferably once/month) and
build fresh so that I keep up with bug fixes and remember how
everything is cobbled together at least that often. Then, I organize
by the month of the build, the build type, and then the run. For
example:
data_storage/2009_05_mitGCM/2x2nAtlAdj_optim/run200905_019
Of course I need to stop updating for a given project once I need a
bunch of consistent runs.
I keep a text file open all the time where I log details for each run
(where it was run, changes made, how it went, etc); when I do a new
update/build I start a new file. When I'm working on multiple
projects at once I still sequence my runs this way so that there is
only every 1 run200905_019 (for example) on any of the disks I work on
for the runs and analysis. This has been working moderately well,
though there are still plenty of kinks and I still lose track. I need
to find a better system for storing the model configuration for each
run as you do; I tend to use a lot of soft links to save disk space,
but they can sometimes be pretty useless for reconstructing what went
into a run. I do like being able to grep the run log files and when I
need to clean up disk space the monthly organization and logs are
helpful. Come to think of it, 4.5 Tb is a handy solution ;-)
Holly
> Good to know I am not alone! I've found a 4.5 Tb NAS has made life
> much easier for my humble needs. But I know I have Gbs of wasted
> model output that I know will never be part of a paper, but I'm just
> not sure *which* model output. My solution for now is to organize
> as follows:
>
> ProjectName/
> input/
> code/
> build/
> results/
> Result1VeryDescriptive/
> <all the data>
> _Model/input
> _Model/code
> _Model/build
> Result2VeryDescriptive/
> ....
>
> where the _Model/ directory is a snapshot of the ProjectName/
> directory (w/o the results) when the model was run. Theoretically I
> could erase <all the data> for any Result and still re-make the
> Result at a later date if I realized I actually need that run.
>
> I suppose I could also do:
>
> ProjectName/
> Result1VeryDescriptive/
> input/
> code/
> build/
> results/
> <all the data>
> Result2VeryDescriptive/
> ....
> but that doesn't allow me to template a bunch of similar runs as
> easily.
>
> Thanks again for all your thoughts...
>
> Cheers, Jody
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
More information about the MITgcm-support
mailing list