[MITgcm-support] mitgcmuv_ad explodes during tape computations
Martin Losch
Martin.Losch at awi.de
Thu Aug 27 06:27:00 EDT 2020
Hi there,
we are running an ecco-v4-like llc90 experiment. Some parameter options are different from the tru ecco-v4, and we have turned off many of the problematic code bits in the adjoint (seaice, ggl90/kpp, gmredi, saltplume).
In a two year simulation with an objective function that is basically the mean salt content in the inner Arctic in the last month of the integration, the model blows up some 16.5 days (394 timesteps) into the reverse part of the simulation (with S/R CALC_R_STAR stopping the simulation). A closer inspection let’s us believe that this actually happens during tape computations (i.e. forward simulations), because (a) the error is triggered by CALC_R_STAR (too SMALL rStarFac[C,W,S]) which is only called from forward_step and forward_stepmd, and (b) we have output (adjDumpFreq) every 5 timesteps and the stop happens 57 timesteps earlier than the last ADJ${var} is written.
Short test simulations (order 100 timesteps) are fine.
My interpretation is that the forward simulation is fine (which it is), but maybe marginally unstable, but that somehow the restarts from the tapes is not correct or inaccurate so that the forward part is pushed across stability limits somehow. Has this happened before to anyone? Do you have any suggestions, how we can debug this problem?
Martin
PS. We use
#define ALLOW_AUTODIFF_WHTAPEIO (very useful!!!!)
#define ECCO_CTRL_DEPRECATED
#define ALLOW_THETA0_CONTROL
#define ALLOW_SALT0_CONTROL
#define ALLOW_ATEMP_CONTROL
#define ALLOW_AQH_CONTROL
#define ALLOW_UWIND_CONTROL
#define ALLOW_VWIND_CONTROL
#define ALLOW_PRECIP_CONTROL
#define ALLOW_RUNOFF_CONTROL (with our own additions to make it work, see <https://github.com/mjlosch/MITgcm/tree/ctrl_runoff> if you are interested)
parameter( nchklev_1 = 4 )
parameter( nchklev_2 = 30 )
parameter( nchklev_3 = 73 )
More information about the MITgcm-support
mailing list