[MITgcm-devel] optim_m1qn3

Fri May 18 14:11:00 EDT 2012

Hi Martin,

Can I please get some clarification as to what you mean by  
"optim_m1qn3 gets stuck and terminates the optimization with "output  
mode 6", which usually means, that the gradient is not accurate enough  
to find a new descent direction"?    How does it determine the  
accuracy of the gradient?  Are you checking the non-linearity of the  
costfunction by comparing the predicted costfinal with the actual  
costfinal?

It actually sounds like the two line-searches may be  
complimentary...one rigorous that can fail if the costfunction becomes  
too non-linear, and one that cheats and sometimes wins in practice.   
So a hybrid approach may be ideal.

Thanks
Matt

On May 7, 2012, at 11:39 PM, Martin Losch wrote:

> Dear optimizers,
>
> I have fiddled some more with the code in MITgcm_contrib/mlosch/ 
> optim_m1qn3 and I am fairly confident that the code is now working  
> properly. That is to say that in the "testbed" that I constructed I  
> get identical results with the original "online" m1qn3 and with my  
> modification m1qn3_offline.F
>
> "Real life test" give mixed results. I am attaching 5 figures. In  
> all plots the "b" experiment (the green line) is with optim_m1qn3  
> and the blue one with the standard optim/lsopt. A "simulation" is  
> one full run with the "mitgcmuv_ad". The 4 cmp_cf_opt*.png  
> experiments are with a regional model with open boundaries (control  
> parameters) and ice shelf cavities and I have the problem with this  
> experiment that lsopt often returns a control vector that is too  
> extreme (see e.g. opt28 and opt29) for the model to swallow and it  
> explodes in the forward integration (sooner or later). All  
> cmp_cf_opt* have this problem and the blue line stops whenever this  
> happens. In one case this happens after 90 simulations but in opt18  
> already after very few simulations. optim_m1qn3 on the other hand  
> does much better in opt18 and opt27 (although it seems to get stuck  
> and all simulations are used on the line search and very little  
> improvement is achieved) and not so well for opt28 and op29 where it  
> seems to get stuck well above the lowest cost values found with  
> lsopt. But all experiments are still running, and there is some hope  
> that the cost function will go down some more.
>
> The 5th figure cmp_cf_MOM17.png shows a run with a global cs32  
> simulation with seaice/gmredi/kpp (gmredi/kpp and seaice_dynamics  
> are turned off in the adjoint, I think). There are 4 experiments.  
> MOM17 (blue line) uses lsopt and nfunc=7, MOM17a (red) uses lsopt  
> and nfunc=1 (so here I show lsopt really only one simulation per  
> iteration and lsopt knows about it), MOM17c (black) uses lsopt and  
> nfunc=100 (just a test) and MOM17b (green) uses optim_m1qn3.  
> Obviously lsopt with nfunc=7 and 1 is doing much better (we only  
> allowed it to do 20 simulations) than m1qn3. It is interesting that  
> nfunc=1 seems to be the better choice in this case.
>
> optim_m1qn3 gets stuck and terminates the optimization with "output  
> mode 6", which usually means, that the gradient is not accurate  
> enough to find a new descent direction (plausible, as we only  
> compute an approximate gradient with gmredi etc turned off). It can  
> also mean that my fiddling with m1qn3 broke it and I still need to  
> spend more time on that, but I could not find a simple case (cost  
> function) where m1qn3_offline fails.
>
> A different interpretation is that with optim_m1qn3 I quickly arrive  
> in a local minimum and get stuck. I think that lsopt actually breaks  
> the BFGS algorithm (since the line search always uses the same  
> simulation in the nfunc-loop, i.e., for each new x+t*dx that lsline  
> offers, simul returns the same cost function and gradient) and the  
> model therefore accidentally gets pushed out of local minima where  
> m1qn3 is not, because it is more accurate and tries to stay close to  
> a (even small) minimum, once it found it.
>
> I am not sure how to continue. Obviously there are cases when lsopt  
> fails, but there are also cases when m1qn3 is not doing too well,  
> plus since we are sometimes working with approximate gradients, we  
> might not want to be told that the gradient is too inaccurate for  
> further descent.
> It would be good to see a few other examples. Maybe you can try to  
> run your problems with optim_m1qn3. Instructions below
>
> Martin
>
> cvs co MITgcm_contrib/mlosch/optim_m1qn3
> Compiling is simpler than lsopt/optim: Edit the Makefile to adjust  
> to your system/compiler and change the include path to point to your  
> build directory (just as in optim/Makefile), make depend && make
>
> the resulting optim.x (same name) takes the same input files and  
> most of the variables in data.optim can stay as they are (so far, I  
> plan to have a separate data.m1qn3 and not use data.optim any more,  
> but for now it's easier for comparisons). There are only TWO things  
> that require attention:
> - numiter (in data.optim) must be larger than one. It is now the  
> number of optimization iterations that you are going to allow ***in  
> total***. I'd put something large, like 1000
> - m1qn3 produces an output file (m1qn3_output.txt, I hard-coded the  
> name for now) that is being reopened each time you run optim.x, so  
> make sure that you keep or restore a copy in the working directory.  
> One could modify optim_sub.F to redirect the m1qn3 output to stdout.  
> I like it better the way it is implemented now.
>
> < 
> cmp_cf_opt18 
> .png 
> > 
> < 
> cmp_cf_opt27 
> .png 
> > 
> < 
> cmp_cf_opt28 
> .png 
> > 
> < 
> cmp_cf_opt29 
> .png><cmp_cf_MOM17.png>_______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel