[MITgcm-support] optim_m1qn3 line search getting stuck
Andrew McRae
andrew.mcrae at physics.ox.ac.uk
Wed Feb 13 08:30:52 EST 2019
Okay, I *think* I understand what is happening. In short, the BFGS
algorithm, which underpins m1qn3, is only guaranteed to behave sensibly if
the cost function is convex. If this is not the case, it can produce
approximations to H, the inverse Hessian, which aren't positive definite.
This can lead to the algorithm producing uphill search directions.
(gradient g, descent direction d, d = -Hg. If H is pos def, g.d = g.(-Hg),
which is < 0 as long as g isn't 0. If H isn't pos-def, this isn't
guaranteed). We're definitely seeing d.g > 0 in the output above, which
can't be good.
Maybe I, or someone else, should write a wrapper that automatically
cold-restarts m1qn3 if something goes wrong (either by checking d.g, or by
checking y.s, see below)?
Longer version:
Define s_k = x_{k+1} - x_k, the state increment
Define y_k = g_{k+1} - g_k, the gradient increment.
BFGS computes an approximation H_{k+1} for the inverse Hessian, from H_k,
s_k, and y_k. If H_k is symmetric then H_{k+1} is also symmetric.
Furthermore, if H_k is positive definite *and <y, s> > 0*, H_{k+1} is also
positive definite. <y, s> > 0 is true for sensible updates steps in a
convex optimization problem, but not true in general (think of a parabola,
but imagine it briefly steepening as you approach the minimum due to
nonlinearities). So a negative-definite H can be generated.
The version used is actually L-BFGS, which maintains a diagonal matrix D_k,
and forms H_k from D_k and the m most recent pairs (s_k, y_k), but the same
things hold.
On Tue, 12 Feb 2019 at 16:11, Andrew McRae <andrew.mcrae at physics.ox.ac.uk>
wrote:
> Actually I was wrong, it's the first Wolfe test being violated. I.e. f(x
> + t*d) turns out to be larger than f(x), even for very small t, where t is
> the step multiplier and d is the search direction.
>
> The values being printed out are t, f(x + t*d) - f(x), and <d, g>, where g
> is the gradient. The second number should be negative for small enough t,
> and the third number should definitely be negative else it's not a descent
> direction.
>
> In fact, even in a sucessful run, this occasionally happens (2nd number
> negative = good, but third number is often positive!)
>
> mlis3 1.000D+00 -1.108D+00 -9.114D-01
> mlis3 1.000D+00 -1.379D-01 6.624D+00
> mlis3 1.000D+00 -5.767D-01 4.122D+00
> mlis3 1.000D+00 -1.120D+00 -7.614D-01
> mlis3 1.000D+00 -5.014D-01 -3.579D-01
> mlis3 1.000D+00 -6.903D-01 -1.962D-01
> mlis3 1.000D+00 -1.657D-01 1.824D-01
>
> I'm now confused, because the lines
> https://github.com/dorugeber/MITgcm/blob/optim_m1qn3/optim_m1qn3/m1qn3_offline.F#L551-L559
> seem to guard against the case d.g > 0.
>
> On Tue, 12 Feb 2019 at 12:02, Andrew McRae <andrew.mcrae at physics.ox.ac.uk>
> wrote:
>
>> This is using the optim_m1qn3 package from mitgcm_contrib.
>>
>> Quite often, the algorithm runs a few steps, then gets stuck in a line
>> search. E.g., after 3 good iterations, I get
>>
>> m1qn3: iter 4, simul 4, f= 3.25818816D+00, h'(0)=-2.84510D+00
>>
>> m1qn3: line search
>>
>> mlis3 fpn=-2.845D+00 d2= 9.74D+01 tmin= 5.72D-07 tmax=
>> 1.00D+20
>> mlis3 1.000D+00 8.992D-01
>> 1.438D+00
>> mlis3 2.468D-01 1.155D-01
>> 6.129D-01
>> mlis3 2.468D-03 7.656D-04
>> 2.227D+02
>> mlis3 2.345D-03 7.240D-04
>> 1.019D+01
>> mlis3 1.759D-03 5.496D-04
>> 2.970D+00
>> mlis3 1.231D-03 3.855D-04
>> -3.000D+00
>> mlis3 8.618D-04 2.713D-04
>> 3.105D-01
>> mlis3 6.032D-04 1.894D-04
>> 2.202D-01
>> mlis3 4.223D-04 1.341D-04
>> 2.223D+00
>> mlis3 2.956D-04 9.450D-05
>> 3.954D-01
>> mlis3 2.069D-04 6.755D-05
>> 3.121D-01
>> mlis3 6.207D-05 1.948D-05
>> 3.642D-01
>>
>> This is for the included tutorial_global_oce_optim running for 1 year,
>> and with mult_hflux_tut set to 0.2 rather than 2.
>>
>> To be honest, I'm not even sure what numbers are being printed, but I
>> suspect one of those first two numbers is the step size multiplier. I.e.
>> it's trying to take smaller and smaller steps, but these are still being
>> rejected.
>>
>> I'm about to dive in with gdb and see what's going on, but my hypothesis
>> is that the second Wolfe test
>> <https://en.wikipedia.org/wiki/Wolfe_conditions> is being violated.
>> Roughly speaking, this forces the gradient to decrease by 10% each
>> iteration (at least, the component of the gradient in the descent
>> direction). This makes sense once the algorithm is in the basin near the
>> minimizer of the cost function, but there's no apriori reason for it to
>> hold further away.
>>
>> Is there likely to be anything wrong with modifying this check to allow
>> (some) gradient steepening? I guess I'll find out...
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20190213/f09d23bd/attachment.html>
More information about the MITgcm-support
mailing list