[MITgcm-support] Restarting the MITgcm
Martin Losch
mlosch at awi-bremerhaven.de
Tue Aug 10 10:15:53 EDT 2004
Uli,
I don't have too much experience with multiprocessor runs, but in the
past I had to rename ALL pickup-files in the case of multiple tiles.
So, if you use 2 processores, then both pickup.ckptA.001.001.data and
pickup.ckptA.002.001.data have to be renamed in
pickup.0000004800.001.001.data and pickup.0000004800.002.001.data so
that each processor/tile has it's own restart file. Have a look at
STDOUT.0003 to see, which file MDSREADFIELD tried to open last.
Martin
On Aug 10, 2004, at 4:06 PM, Uli Riemenschneider wrote:
> Hi Martin,
>
> thanks for the reply. I followed your instructions, but somehow it is
> still not working. In file STDERR.0003 I find the following error
> message:
>
> (PID.TID 0003.0001) *** ERROR *** MDSREADFIELD: File does not exist
>
> I am not even sure what file it is referring too here? Any idea?
> I am running the model on multiple processors, does that make a
> difference to the restart procedure?
>
> Thanks for the help!
> Ciao
> Uli
>
> Martin Losch wrote:
>> Uli,
>> there are two run-time flags in data namelist PARM03:
>> chkptFreq and pChkptFreq, the latter sets the frequency for permanent
>> checkpoint/restart files (they have the correct time step number
>> already in their file name), the former the frequency for the rolling
>> checkpoint/restart files (with ckptA,ckptB in their names).
>> The rolling checkpoints files alternatively are overwritten at a new
>> checkpoint. So if your chkptFreq = 5h, then after 5h
>> pickup.ckptA.001.001.data/meta are written, after 10h
>> pickup.ckptB.001.001.data/meta, and after 15h ckptA is overwritten,
>> then after 20h ckptB is overwritten, etc. That means that ckptA is
>> NOT always the latest checkpoint file, only half of the time (Dear
>> code czars, I hope that this is correct).
>> If you want to restart from, say, ckptA, you have to rename
>> pickup.ckptA.001.001.data to pickup.0000004800.001.001.data (don't
>> forget the *.data) and set iter0 or starttime appropriately, as you
>> have done. In case there are some other pickup_*.ckptA.001.001.data
>> files, they have to be renamed, too.
>> I would however recommend for the future, that if you know for how
>> long you can and want to run your experiement and from where you want
>> to restart it, set
>> chkptFreq=0.0, (to turn off rolling checkpoints)
>> pChkptFreq=whatever this time is in seconds (for permanent checkpoint
>> files)
>> If you still have problems, have a look at the standard-output and
>> this will tell you which restart files are not found (just to make
>> sure that this is not the problem: for the restart you'll also need
>> all data files that you used for the run that lead to the restart
>> file).
>> Hope this helps,
>> Martin
>>> Hello again,
>>>
>>> I have started taking the MIT model through some longer runs and have
>>> tried to restart them to safe time, but so far unsuccessfully.
>>>
>>> The runs produce two types of pickup files: pickup.ckptA.data and
>>> pickup.chptB.data. I figured out that the 'A' file is a collection of
>>> the data after the final time step and I am assuming the 'B' file is
>>> from some time step before that.
>>>
>>> I have tried renaming pickup.ckptA.data to pickup.000004800 and set
>>> nIter0 to 4800 to start the model at this point, but it aborts almost
>>> immediately.
>>>
>>> What do I need to do to get it going successfully?!
>>> Thanks,
>>> Uli
>> Martin Losch // mailto:mlosch at awi-bremerhaven.de
>> Alfred-Wegener-Institut für Polar- und Meeresforschung
>> Postfach 120161, 27515 Bremerhaven, Germany
>> Tel./Fax: ++49(471)4831-1872/1797
>> http://www.awi-bremerhaven.de/People/show?mlosch
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://dev.mitgcm.org/mailman/listinfo/mitgcm-support
>
>
> --
> *********************************************************
> Ulrike Riemenschneider, Postdoctoral Investigator
> Physical Oceanography Dept. MS #21
> Woods Hole Oceanographic Institution
> Woods Hole, MA 02543, USA
>
> Phone: (+1) 508 289 2916 Fax: (+1) 508 457 2181
> *********************************************************
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://dev.mitgcm.org/mailman/listinfo/mitgcm-support
More information about the MITgcm-support
mailing list