[MITgcm-support] checkpointing problem

Tue Feb 10 17:27:40 EST 2004

Fiona,

I had a look at the tamc_output file it appears
at first sight that you have serious recomputation 
problems at the outer loop levels (lev2, lev3)
(e.g. various of
TAMC RECOMPUTATION 2 WARNING DOLOOP_STMT tamc_code.f:374662 in the_main_loop
 extensive recomputations are required
)
Whether that actually results in problems or is just an artefact
of TAMC's very poor WARNING messaging is hard to tell
without looking at the adjoint code.

A first suggestion (to rule out simple things):
Try to get rid of the warnings related to 'mythid'
There are a few of them and they may or may not be related
(I don't know how you got them, mythid is usually benign
since it's passive):

TAMC RECOMPUTATION 1 WARNING IF_STMT tamc_code.f:363916 in solve_for_pressure
TAMC RECOMPUTATION 1 WARNING CALL_STMT tamc_code.f:79385 in forward_step
TAMC RECOMPUTATION 1 WARNING IF_STMT tamc_code.f:79383 in forward_step
TAMC RECOMPUTATION 1 WARNING CALL_STMT tamc_code.f:374505 in the_main_loop
TAMC RECOMPUTATION 1 WARNING CALL_STMT tamc_code.f:374491 in the_main_loop

Not knowing your code, I suspect you introduced new
subroutines (for the cost function?) and something in there
may need recomputations?

In order to provide more help I would have to see the 
adjoint code produced by tamc as well as the routines
the_main_loop.F, forward_step.F

Cheers
-Patrick

Quoting Fiona McLay <m221003 at regen.dkrz.de>:

> Patrick,
> 
> Thanks for getting back to me.  Here are the files.  I am using TAMC and
> not TAF.
> 
> I don't think the model is recomputing from the begining each time but
> doesn't appear to be going back to the correct checkpoint. It seems to
> recompute from the first timestep everytime it should read a level 3
> checkpoint, but it stores the level 3 checkpoint it wanted to read.
> Everytime it should read a level two checkpoint it recomputes from the
> last level 3 checkpoint, but again the level 2 checkpoint is stored. A
> similar thing happens with the level one checkpoints. So if in the current
> case, where I have 4 level 3 checkpoints it recomputes from the 1st
> timestep 4 times instead of only once and storing all level 3 checkpoints.
> I hope this makes sense.
> 
> Thanks
> 
> Fiona
> 
> On Mon, 9 Feb 2004 heimbach at mit.edu wrote:
> 
> > Fiona,
> >
> > it's hard to tell what's going out without more info.
> > One obvious guess is that you have
> > excessive recomputations going on, which would explain
> > going back to timestep one at each timestep.
> > Could you send me the following files
> > - taf_ad.log
> > - ECCO_CPPOPTIONS.h
> > - CPP_OPTIONS.h
> >
> > -Patrick
> >
> > PS:
> > I assume you are using TAF, are you?
> > If you are using tamc.h there should be a file called
> > tamc_ad.log or <something>.prot
> > Please send me this one then.
> >
> >
> >
> > Quoting Fiona McLay <m221003 at regen.dkrz.de>:
> >
> > > Hello,
> > >
> > > I am trying to use the adjoint to the MIT model with an eddy resolving
> > > channel. I can get the adjoint model to run but I appear to be having
> > > trouble with the checkpointing scheme. It looks as if the checkpoints
> are
> > > being contiuously overwritten so that it can only store one checkpoint
> of
> > > each level at a time, leading to many recomputations.  It goes back to
> > > the 1st timestep every time it needs a level 3 checkpoint, and to the
> last
> > > level 3 checkpoint everytime it needs a level 2 checkpoint.
> > >
> > > Any idea what I am doing wrong?  I have checked that I have
> > > allow_tamc_checkpointing defined, and that I have the right number of
> > > checkpoints for the length of run.
> > >
> > > Any help would be much appreciated.
> > >
> > > Thanks
> > >
> > > Fiona
> > >
> > >
> > >
> > > _______________________________________________
> > > MITgcm-support mailing list
> > > MITgcm-support at mitgcm.org
> > > http://dev.mitgcm.org/mailman/listinfo/mitgcm-support
> > >
> >
> >
> >
>