[MITgcm-support] optim_m1qn3 line search getting stuck

Tue Feb 12 11:11:45 EST 2019

Actually I was wrong, it's the first Wolfe test being violated.  I.e. f(x +
t*d) turns out to be larger than f(x), even for very small t, where t is
the step multiplier and d is the search direction.

The values being printed out are t, f(x + t*d) - f(x), and <d, g>, where g
is the gradient.  The second number should be negative for small enough t,
and the third number should definitely be negative else it's not a descent
direction.

In fact, even in a sucessful run, this occasionally happens (2nd number
negative = good, but third number is often positive!)

     mlis3     1.000D+00 -1.108D+00 -9.114D-01
     mlis3     1.000D+00 -1.379D-01  6.624D+00
     mlis3     1.000D+00 -5.767D-01  4.122D+00
     mlis3     1.000D+00 -1.120D+00 -7.614D-01
     mlis3     1.000D+00 -5.014D-01 -3.579D-01
     mlis3     1.000D+00 -6.903D-01 -1.962D-01
     mlis3     1.000D+00 -1.657D-01  1.824D-01

I'm now confused, because the lines
https://github.com/dorugeber/MITgcm/blob/optim_m1qn3/optim_m1qn3/m1qn3_offline.F#L551-L559
seem to guard against the case d.g > 0.

On Tue, 12 Feb 2019 at 12:02, Andrew McRae <andrew.mcrae at physics.ox.ac.uk>
wrote:

> This is using the optim_m1qn3 package from mitgcm_contrib.
>
> Quite often, the algorithm runs a few steps, then gets stuck in a line
> search.  E.g., after 3 good iterations, I get
>
>  m1qn3: iter 4, simul 4, f= 3.25818816D+00, h'(0)=-2.84510D+00
>
>  m1qn3: line search
>
>      mlis3       fpn=-2.845D+00 d2= 9.74D+01  tmin= 5.72D-07 tmax= 1.00D+20
>      mlis3                                      1.000D+00  8.992D-01
> 1.438D+00
>      mlis3                                      2.468D-01  1.155D-01
> 6.129D-01
>      mlis3                                      2.468D-03  7.656D-04
> 2.227D+02
>      mlis3                                      2.345D-03  7.240D-04
> 1.019D+01
>      mlis3                                      1.759D-03  5.496D-04
> 2.970D+00
>      mlis3                                      1.231D-03  3.855D-04
> -3.000D+00
>      mlis3                                      8.618D-04  2.713D-04
> 3.105D-01
>      mlis3                                      6.032D-04  1.894D-04
> 2.202D-01
>      mlis3                                      4.223D-04  1.341D-04
> 2.223D+00
>      mlis3                                      2.956D-04  9.450D-05
> 3.954D-01
>      mlis3                                      2.069D-04  6.755D-05
> 3.121D-01
>      mlis3                                      6.207D-05  1.948D-05
> 3.642D-01
>
> This is for the included tutorial_global_oce_optim running for 1 year, and
> with mult_hflux_tut set to 0.2 rather than 2.
>
> To be honest, I'm not even sure what numbers are being printed, but I
> suspect one of those first two numbers is the step size multiplier.  I.e.
> it's trying to take smaller and smaller steps, but these are still being
> rejected.
>
> I'm about to dive in with gdb and see what's going on, but my hypothesis
> is that the second Wolfe test
> <https://en.wikipedia.org/wiki/Wolfe_conditions> is being violated.
> Roughly speaking, this forces the gradient to decrease by 10% each
> iteration (at least, the component of the gradient in the descent
> direction).  This makes sense once the algorithm is in the basin near the
> minimizer of the cost function, but there's no apriori reason for it to
> hold further away.
>
> Is there likely to be anything wrong with modifying this check to allow
> (some) gradient steepening?  I guess I'll find out...
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20190212/bf3a1af6/attachment-0001.html>