[MITgcm-devel] optim_m1qn3

Mon May 21 04:04:44 EDT 2012

Hi Matt,

can I refer you to the documentation of m1qn3?
<https://who.rocq.inria.fr/Jean-Charles.Gilbert/modulopt/optimization-routines/m1qn3/m1qn3.pdf>
Section 4.4 More on some output modes
[...]
The output mode omode = 6 can have various origins. • It can come from a mistake in the calculation of the gradient. This is likely to be
the case when the solver stops with that output mode after very few iterations.
• If the number of step-size trials at the last iteration is small, this can mean that dxmin has been chosen too large. In that case, decreasing dxmin should have the clear effect of increasing the number of iterations.
• It can also come from rounding error in the simulator. The precision on the op- timal solution that can be reached by the solver depends indeed on the precision of the simulator (the amount of rounding errors made in the computation).
[...]

There is more in this section, but the above agrees with my experience: whenever m1qn3 ended with output mode 6, I usually had to improve the accuracy of the gradient computation (find and fix bugs in the adjoint model). But you are right, if the problem is very nonlinear and two gradients that are very close in x-space are very different, this output mode is probably also what happens.

Incedentally, I ran the tutorial_global_oce_optim (because the cost function/gradient computation is with less baggage than in my runs) with lsopt/optim and m1qn3, see the attached figure that shows the cost function values of all simulations (not iterations). The initial constuftion is 53.5, and I use fmin=5.74 in run_lsopt and run_m1qn3 and fmin=40 (so a much smaller expected decrease) in run_lsopt2 and run_m1qn32. For the small fmin (very large expected decrease), lsopt (blue) overshoots twice and m1qn3 (green) is better with fewer iterations. For the large fmin (small expected decrease) lsopt (red) and m1qn3 (black) give almost the same results (cost function values start to differ at the tenth significant figure in simulation 2, and this difference then gradually increases). I think they are basically the same because the different Wolf's tests (in lsline/mlis3) are passed the first time around, so that the hacked BFGS in lsopt does not "show up". This is so until simulation 6 where lsopt is better, but then lsopt generates a control vector that makes the model blow up (forward model!). m1qn3 (black) continues and finds a smaller cost function with simulation 9 and later. There are 3 peaks in the black curve. The first two are generated in the optimization, the one at simulation 38 is a cold restart with fmin=4 after an "output mode 6". m1qn3 then continues until iteration 100 where I stopped it with fc=7.789. It looks like m1qn3 would not get anywhere without a further restart.

My conclusions: I would trust my implementation more than lsopt, but there might still be a small problem in my m1qn3 adaption, that shows up in the comparison of the red and black curve where lsopt appears to be more efficient (until simulation 6).

So, Matt, I guess you can (and should) try both optimizers for finding the smallest cost function. I would go further: couldn't we find a better method for finding the minimum of a cost function given that there are only a limited number of simulations allowed? Have a look at Michel Crucifix's work, for example: http://www.elic.ucl.ac.be/repomodx/itop, and here's a nice tutorial presentation by him <http://www.climate.be/users/crucifix/Emulator_talk-o.pdf>

Martin

On May 18, 2012, at 8:11 PM, Matthew Mazloff wrote:

> Hi Martin,
> 
> Can I please get some clarification as to what you mean by "optim_m1qn3 gets stuck and terminates the optimization with "output mode 6", which usually means, that the gradient is not accurate enough to find a new descent direction"?    How does it determine the accuracy of the gradient?  Are you checking the non-linearity of the costfunction by comparing the predicted costfinal with the actual costfinal?
> 
> It actually sounds like the two line-searches may be complimentary...one rigorous that can fail if the costfunction becomes too non-linear, and one that cheats and sometimes wins in practice.  So a hybrid approach may be ideal.
> 
> Thanks
> Matt
> 
> 
> 
> 
> 
> On May 7, 2012, at 11:39 PM, Martin Losch wrote:
> 
>> Dear optimizers,
>> 
>> I have fiddled some more with the code in MITgcm_contrib/mlosch/optim_m1qn3 and I am fairly confident that the code is now working properly. That is to say that in the "testbed" that I constructed I get identical results with the original "online" m1qn3 and with my modification m1qn3_offline.F
>> 
>> "Real life test" give mixed results. I am attaching 5 figures. In all plots the "b" experiment (the green line) is with optim_m1qn3 and the blue one with the standard optim/lsopt. A "simulation" is one full run with the "mitgcmuv_ad". The 4 cmp_cf_opt*.png experiments are with a regional model with open boundaries (control parameters) and ice shelf cavities and I have the problem with this experiment that lsopt often returns a control vector that is too extreme (see e.g. opt28 and opt29) for the model to swallow and it explodes in the forward integration (sooner or later). All cmp_cf_opt* have this problem and the blue line stops whenever this happens. In one case this happens after 90 simulations but in opt18 already after very few simulations. optim_m1qn3 on the other hand does much better in opt18 and opt27 (although it seems to get stuck and all simulations are used on the line search and very little improvement is achieved) and not so well for opt28 and op29 where it seems to get stuck well above the lowest cost values found with lsopt. But all experiments are still running, and there is some hope that the cost function will go down some more.
>> 
>> The 5th figure cmp_cf_MOM17.png shows a run with a global cs32 simulation with seaice/gmredi/kpp (gmredi/kpp and seaice_dynamics are turned off in the adjoint, I think). There are 4 experiments. MOM17 (blue line) uses lsopt and nfunc=7, MOM17a (red) uses lsopt and nfunc=1 (so here I show lsopt really only one simulation per iteration and lsopt knows about it), MOM17c (black) uses lsopt and nfunc=100 (just a test) and MOM17b (green) uses optim_m1qn3. Obviously lsopt with nfunc=7 and 1 is doing much better (we only allowed it to do 20 simulations) than m1qn3. It is interesting that nfunc=1 seems to be the better choice in this case.
>> 
>> optim_m1qn3 gets stuck and terminates the optimization with "output mode 6", which usually means, that the gradient is not accurate enough to find a new descent direction (plausible, as we only compute an approximate gradient with gmredi etc turned off). It can also mean that my fiddling with m1qn3 broke it and I still need to spend more time on that, but I could not find a simple case (cost function) where m1qn3_offline fails.
>> 
>> A different interpretation is that with optim_m1qn3 I quickly arrive in a local minimum and get stuck. I think that lsopt actually breaks the BFGS algorithm (since the line search always uses the same simulation in the nfunc-loop, i.e., for each new x+t*dx that lsline offers, simul returns the same cost function and gradient) and the model therefore accidentally gets pushed out of local minima where m1qn3 is not, because it is more accurate and tries to stay close to a (even small) minimum, once it found it.
>> 
>> I am not sure how to continue. Obviously there are cases when lsopt fails, but there are also cases when m1qn3 is not doing too well, plus since we are sometimes working with approximate gradients, we might not want to be told that the gradient is too inaccurate for further descent.
>> It would be good to see a few other examples. Maybe you can try to run your problems with optim_m1qn3. Instructions below
>> 
>> Martin
>> 
>> cvs co MITgcm_contrib/mlosch/optim_m1qn3
>> Compiling is simpler than lsopt/optim: Edit the Makefile to adjust to your system/compiler and change the include path to point to your build directory (just as in optim/Makefile), make depend && make
>> 
>> the resulting optim.x (same name) takes the same input files and most of the variables in data.optim can stay as they are (so far, I plan to have a separate data.m1qn3 and not use data.optim any more, but for now it's easier for comparisons). There are only TWO things that require attention:
>> - numiter (in data.optim) must be larger than one. It is now the number of optimization iterations that you are going to allow ***in total***. I'd put something large, like 1000
>> - m1qn3 produces an output file (m1qn3_output.txt, I hard-coded the name for now) that is being reopened each time you run optim.x, so make sure that you keep or restore a copy in the working directory. One could modify optim_sub.F to redirect the m1qn3 output to stdout. I like it better the way it is implemented now.
>> 
>> <cmp_cf_opt18.png><cmp_cf_opt27.png><cmp_cf_opt28.png><cmp_cf_opt29.png><cmp_cf_MOM17.png>_______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
> 
> 
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cmp_cf_run_lsopt.png
Type: image/png
Size: 49801 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-devel/attachments/20120521/695f2771/attachment-0001.png>