[MITgcm-devel] optim_m1qn3
Matthew Mazloff
mmazloff at ucsd.edu
Fri May 18 14:11:00 EDT 2012
Hi Martin,
Can I please get some clarification as to what you mean by
"optim_m1qn3 gets stuck and terminates the optimization with "output
mode 6", which usually means, that the gradient is not accurate enough
to find a new descent direction"? How does it determine the
accuracy of the gradient? Are you checking the non-linearity of the
costfunction by comparing the predicted costfinal with the actual
costfinal?
It actually sounds like the two line-searches may be
complimentary...one rigorous that can fail if the costfunction becomes
too non-linear, and one that cheats and sometimes wins in practice.
So a hybrid approach may be ideal.
Thanks
Matt
On May 7, 2012, at 11:39 PM, Martin Losch wrote:
> Dear optimizers,
>
> I have fiddled some more with the code in MITgcm_contrib/mlosch/
> optim_m1qn3 and I am fairly confident that the code is now working
> properly. That is to say that in the "testbed" that I constructed I
> get identical results with the original "online" m1qn3 and with my
> modification m1qn3_offline.F
>
> "Real life test" give mixed results. I am attaching 5 figures. In
> all plots the "b" experiment (the green line) is with optim_m1qn3
> and the blue one with the standard optim/lsopt. A "simulation" is
> one full run with the "mitgcmuv_ad". The 4 cmp_cf_opt*.png
> experiments are with a regional model with open boundaries (control
> parameters) and ice shelf cavities and I have the problem with this
> experiment that lsopt often returns a control vector that is too
> extreme (see e.g. opt28 and opt29) for the model to swallow and it
> explodes in the forward integration (sooner or later). All
> cmp_cf_opt* have this problem and the blue line stops whenever this
> happens. In one case this happens after 90 simulations but in opt18
> already after very few simulations. optim_m1qn3 on the other hand
> does much better in opt18 and opt27 (although it seems to get stuck
> and all simulations are used on the line search and very little
> improvement is achieved) and not so well for opt28 and op29 where it
> seems to get stuck well above the lowest cost values found with
> lsopt. But all experiments are still running, and there is some hope
> that the cost function will go down some more.
>
> The 5th figure cmp_cf_MOM17.png shows a run with a global cs32
> simulation with seaice/gmredi/kpp (gmredi/kpp and seaice_dynamics
> are turned off in the adjoint, I think). There are 4 experiments.
> MOM17 (blue line) uses lsopt and nfunc=7, MOM17a (red) uses lsopt
> and nfunc=1 (so here I show lsopt really only one simulation per
> iteration and lsopt knows about it), MOM17c (black) uses lsopt and
> nfunc=100 (just a test) and MOM17b (green) uses optim_m1qn3.
> Obviously lsopt with nfunc=7 and 1 is doing much better (we only
> allowed it to do 20 simulations) than m1qn3. It is interesting that
> nfunc=1 seems to be the better choice in this case.
>
> optim_m1qn3 gets stuck and terminates the optimization with "output
> mode 6", which usually means, that the gradient is not accurate
> enough to find a new descent direction (plausible, as we only
> compute an approximate gradient with gmredi etc turned off). It can
> also mean that my fiddling with m1qn3 broke it and I still need to
> spend more time on that, but I could not find a simple case (cost
> function) where m1qn3_offline fails.
>
> A different interpretation is that with optim_m1qn3 I quickly arrive
> in a local minimum and get stuck. I think that lsopt actually breaks
> the BFGS algorithm (since the line search always uses the same
> simulation in the nfunc-loop, i.e., for each new x+t*dx that lsline
> offers, simul returns the same cost function and gradient) and the
> model therefore accidentally gets pushed out of local minima where
> m1qn3 is not, because it is more accurate and tries to stay close to
> a (even small) minimum, once it found it.
>
> I am not sure how to continue. Obviously there are cases when lsopt
> fails, but there are also cases when m1qn3 is not doing too well,
> plus since we are sometimes working with approximate gradients, we
> might not want to be told that the gradient is too inaccurate for
> further descent.
> It would be good to see a few other examples. Maybe you can try to
> run your problems with optim_m1qn3. Instructions below
>
> Martin
>
> cvs co MITgcm_contrib/mlosch/optim_m1qn3
> Compiling is simpler than lsopt/optim: Edit the Makefile to adjust
> to your system/compiler and change the include path to point to your
> build directory (just as in optim/Makefile), make depend && make
>
> the resulting optim.x (same name) takes the same input files and
> most of the variables in data.optim can stay as they are (so far, I
> plan to have a separate data.m1qn3 and not use data.optim any more,
> but for now it's easier for comparisons). There are only TWO things
> that require attention:
> - numiter (in data.optim) must be larger than one. It is now the
> number of optimization iterations that you are going to allow ***in
> total***. I'd put something large, like 1000
> - m1qn3 produces an output file (m1qn3_output.txt, I hard-coded the
> name for now) that is being reopened each time you run optim.x, so
> make sure that you keep or restore a copy in the working directory.
> One could modify optim_sub.F to redirect the m1qn3 output to stdout.
> I like it better the way it is implemented now.
>
> <
> cmp_cf_opt18
> .png
> >
> <
> cmp_cf_opt27
> .png
> >
> <
> cmp_cf_opt28
> .png
> >
> <
> cmp_cf_opt29
> .png><cmp_cf_MOM17.png>_______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list