[MITgcm-support] optim_m1qn3 line search getting stuck

Andrew McRae andrew.mcrae at physics.ox.ac.uk
Fri Feb 22 08:10:12 EST 2019


Just a small update: this was wrong, and the m1qn3 code adequately checks
that <y, s> is positive and <d, g> is negative.  If these are violated, the
code exists pretty quickly with omode set to 7.

The numbers being printed in the log file, which I was worried about, seem
to be <d_old, g_new>.  It's fine for this to be positive.... it just means
the step went past the minimum and up the other side.

(The control flow in this routine is disgusting!  Welcome to 1980s
Fortran.  It was only clear when various variables were being updated once
I stepped through line-by-line with a debugger for a few iterations.)



On Wed, 13 Feb 2019 at 13:30, Andrew McRae <andrew.mcrae at physics.ox.ac.uk>
wrote:

> Okay, I *think* I understand what is happening.  In short, the BFGS
> algorithm, which underpins m1qn3, is only guaranteed to behave sensibly if
> the cost function is convex.  If this is not the case, it can produce
> approximations to H, the inverse Hessian, which aren't positive definite.
> This can lead to the algorithm producing uphill search directions.
> (gradient g, descent direction d, d = -Hg.  If H is pos def, g.d = g.(-Hg),
> which is < 0 as long as g isn't 0.  If H isn't pos-def, this isn't
> guaranteed).  We're definitely seeing d.g > 0 in the output above, which
> can't be good.
>
> Maybe I, or someone else, should write a wrapper that automatically
> cold-restarts m1qn3 if something goes wrong (either by checking d.g, or by
> checking y.s, see below)?
>
> Longer version:
>
> Define s_k = x_{k+1} - x_k, the state increment
> Define y_k = g_{k+1} - g_k, the gradient increment.
>
> BFGS computes an approximation H_{k+1} for the inverse Hessian, from H_k,
> s_k, and y_k.  If H_k is symmetric then H_{k+1} is also symmetric.
> Furthermore, if H_k is positive definite *and <y, s> > 0*, H_{k+1} is
> also positive definite.  <y, s> > 0 is true for sensible updates steps in a
> convex optimization problem, but not true in general (think of a parabola,
> but imagine it briefly steepening as you approach the minimum due to
> nonlinearities).  So a negative-definite H can be generated.
>
> The version used is actually L-BFGS, which maintains a diagonal matrix
> D_k, and forms H_k from D_k and the m most recent pairs (s_k, y_k), but the
> same things hold.
>
>
>
> On Tue, 12 Feb 2019 at 16:11, Andrew McRae <andrew.mcrae at physics.ox.ac.uk>
> wrote:
>
>> Actually I was wrong, it's the first Wolfe test being violated.  I.e. f(x
>> + t*d) turns out to be larger than f(x), even for very small t, where t is
>> the step multiplier and d is the search direction.
>>
>> The values being printed out are t, f(x + t*d) - f(x), and <d, g>, where
>> g is the gradient.  The second number should be negative for small enough
>> t, and the third number should definitely be negative else it's not a
>> descent direction.
>>
>> In fact, even in a sucessful run, this occasionally happens (2nd number
>> negative = good, but third number is often positive!)
>>
>>      mlis3     1.000D+00 -1.108D+00 -9.114D-01
>>      mlis3     1.000D+00 -1.379D-01  6.624D+00
>>      mlis3     1.000D+00 -5.767D-01  4.122D+00
>>      mlis3     1.000D+00 -1.120D+00 -7.614D-01
>>      mlis3     1.000D+00 -5.014D-01 -3.579D-01
>>      mlis3     1.000D+00 -6.903D-01 -1.962D-01
>>      mlis3     1.000D+00 -1.657D-01  1.824D-01
>>
>> I'm now confused, because the lines
>> https://github.com/dorugeber/MITgcm/blob/optim_m1qn3/optim_m1qn3/m1qn3_offline.F#L551-L559
>> seem to guard against the case d.g > 0.
>>
>> On Tue, 12 Feb 2019 at 12:02, Andrew McRae <andrew.mcrae at physics.ox.ac.uk>
>> wrote:
>>
>>> This is using the optim_m1qn3 package from mitgcm_contrib.
>>>
>>> Quite often, the algorithm runs a few steps, then gets stuck in a line
>>> search.  E.g., after 3 good iterations, I get
>>>
>>>  m1qn3: iter 4, simul 4, f= 3.25818816D+00, h'(0)=-2.84510D+00
>>>
>>>  m1qn3: line search
>>>
>>>      mlis3       fpn=-2.845D+00 d2= 9.74D+01  tmin= 5.72D-07 tmax=
>>> 1.00D+20
>>>      mlis3                                      1.000D+00  8.992D-01
>>> 1.438D+00
>>>      mlis3                                      2.468D-01  1.155D-01
>>> 6.129D-01
>>>      mlis3                                      2.468D-03  7.656D-04
>>> 2.227D+02
>>>      mlis3                                      2.345D-03  7.240D-04
>>> 1.019D+01
>>>      mlis3                                      1.759D-03  5.496D-04
>>> 2.970D+00
>>>      mlis3                                      1.231D-03  3.855D-04
>>> -3.000D+00
>>>      mlis3                                      8.618D-04  2.713D-04
>>> 3.105D-01
>>>      mlis3                                      6.032D-04  1.894D-04
>>> 2.202D-01
>>>      mlis3                                      4.223D-04  1.341D-04
>>> 2.223D+00
>>>      mlis3                                      2.956D-04  9.450D-05
>>> 3.954D-01
>>>      mlis3                                      2.069D-04  6.755D-05
>>> 3.121D-01
>>>      mlis3                                      6.207D-05  1.948D-05
>>> 3.642D-01
>>>
>>> This is for the included tutorial_global_oce_optim running for 1 year,
>>> and with mult_hflux_tut set to 0.2 rather than 2.
>>>
>>> To be honest, I'm not even sure what numbers are being printed, but I
>>> suspect one of those first two numbers is the step size multiplier.  I.e.
>>> it's trying to take smaller and smaller steps, but these are still being
>>> rejected.
>>>
>>> I'm about to dive in with gdb and see what's going on, but my hypothesis
>>> is that the second Wolfe test
>>> <https://en.wikipedia.org/wiki/Wolfe_conditions> is being violated.
>>> Roughly speaking, this forces the gradient to decrease by 10% each
>>> iteration (at least, the component of the gradient in the descent
>>> direction).  This makes sense once the algorithm is in the basin near the
>>> minimizer of the cost function, but there's no apriori reason for it to
>>> hold further away.
>>>
>>> Is there likely to be anything wrong with modifying this check to allow
>>> (some) gradient steepening?  I guess I'll find out...
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20190222/0021b72d/attachment.html>


More information about the MITgcm-support mailing list