[MITgcm-support] optim_m1qn3 line search getting stuck
Andrew McRae
andrew.mcrae at physics.ox.ac.uk
Wed Feb 13 18:24:47 EST 2019
Good question. I don't know. I don't have much practical experience in
this area.
My gut feeling is that if you're far enough away from the minimum for
non-convexity to be a problem, gradient descent (i.e. a cold start) is as
good a choice as any. But maybe this is unduly pessimistic.
There's definitely literature on globalisation strategies for BFGS. One of
the most highly cited papers is
https://doi.org/10.1016/S0377-0427(00)00540-9 (no numerical examples, only
proofs), and that simply replaces y_k by y_k' = y_k + r_k*s_k, where r_k is
a scalar of order ||g_k||.
On Wed, 13 Feb 2019 at 21:48, Martin Losch <Martin.Losch at awi.de> wrote:
> Hi Andrew,
>
> do you think you could “cold start” the routine in the middle of an
> optimization, i.e. throw away all storted Hessian information, or just the
> new (problematic) contribution <y, s> < 0. I find this routine so difficult
> to read (especially with all the French comments) and haven’t had good look
> at it in a long time.
>
> Martin
>
> > On 13. Feb 2019, at 14:30, Andrew McRae <andrew.mcrae at physics.ox.ac.uk>
> wrote:
> >
> > Okay, I think I understand what is happening. In short, the BFGS
> algorithm, which underpins m1qn3, is only guaranteed to behave sensibly if
> the cost function is convex. If this is not the case, it can produce
> approximations to H, the inverse Hessian, which aren't positive definite.
> This can lead to the algorithm producing uphill search directions.
> (gradient g, descent direction d, d = -Hg. If H is pos def, g.d = g.(-Hg),
> which is < 0 as long as g isn't 0. If H isn't pos-def, this isn't
> guaranteed). We're definitely seeing d.g > 0 in the output above, which
> can't be good.
> >
> > Maybe I, or someone else, should write a wrapper that automatically
> cold-restarts m1qn3 if something goes wrong (either by checking d.g, or by
> checking y.s, see below)?
> >
> > Longer version:
> >
> > Define s_k = x_{k+1} - x_k, the state increment
> > Define y_k = g_{k+1} - g_k, the gradient increment.
> >
> > BFGS computes an approximation H_{k+1} for the inverse Hessian, from
> H_k, s_k, and y_k. If H_k is symmetric then H_{k+1} is also symmetric.
> Furthermore, if H_k is positive definite and <y, s> > 0, H_{k+1} is also
> positive definite. <y, s> > 0 is true for sensible updates steps in a
> convex optimization problem, but not true in general (think of a parabola,
> but imagine it briefly steepening as you approach the minimum due to
> nonlinearities). So a negative-definite H can be generated.
> >
> > The version used is actually L-BFGS, which maintains a diagonal matrix
> D_k, and forms H_k from D_k and the m most recent pairs (s_k, y_k), but the
> same things hold.
> >
> >
> >
> > On Tue, 12 Feb 2019 at 16:11, Andrew McRae <
> andrew.mcrae at physics.ox.ac.uk> wrote:
> > Actually I was wrong, it's the first Wolfe test being violated. I.e.
> f(x + t*d) turns out to be larger than f(x), even for very small t, where t
> is the step multiplier and d is the search direction.
> >
> > The values being printed out are t, f(x + t*d) - f(x), and <d, g>, where
> g is the gradient. The second number should be negative for small enough
> t, and the third number should definitely be negative else it's not a
> descent direction.
> >
> > In fact, even in a sucessful run, this occasionally happens (2nd number
> negative = good, but third number is often positive!)
> >
> > mlis3 1.000D+00 -1.108D+00 -9.114D-01
> > mlis3 1.000D+00 -1.379D-01 6.624D+00
> > mlis3 1.000D+00 -5.767D-01 4.122D+00
> > mlis3 1.000D+00 -1.120D+00 -7.614D-01
> > mlis3 1.000D+00 -5.014D-01 -3.579D-01
> > mlis3 1.000D+00 -6.903D-01 -1.962D-01
> > mlis3 1.000D+00 -1.657D-01 1.824D-01
> >
> > I'm now confused, because the lines
> https://github.com/dorugeber/MITgcm/blob/optim_m1qn3/optim_m1qn3/m1qn3_offline.F#L551-L559
> seem to guard against the case d.g > 0.
> >
> > On Tue, 12 Feb 2019 at 12:02, Andrew McRae <
> andrew.mcrae at physics.ox.ac.uk> wrote:
> > This is using the optim_m1qn3 package from mitgcm_contrib.
> >
> > Quite often, the algorithm runs a few steps, then gets stuck in a line
> search. E.g., after 3 good iterations, I get
> >
> > m1qn3: iter 4, simul 4, f= 3.25818816D+00, h'(0)=-2.84510D+00
> >
> > m1qn3: line search
> >
> > mlis3 fpn=-2.845D+00 d2= 9.74D+01 tmin= 5.72D-07 tmax=
> 1.00D+20
> > mlis3 1.000D+00 8.992D-01
> 1.438D+00
> > mlis3 2.468D-01 1.155D-01
> 6.129D-01
> > mlis3 2.468D-03 7.656D-04
> 2.227D+02
> > mlis3 2.345D-03 7.240D-04
> 1.019D+01
> > mlis3 1.759D-03 5.496D-04
> 2.970D+00
> > mlis3 1.231D-03 3.855D-04
> -3.000D+00
> > mlis3 8.618D-04 2.713D-04
> 3.105D-01
> > mlis3 6.032D-04 1.894D-04
> 2.202D-01
> > mlis3 4.223D-04 1.341D-04
> 2.223D+00
> > mlis3 2.956D-04 9.450D-05
> 3.954D-01
> > mlis3 2.069D-04 6.755D-05
> 3.121D-01
> > mlis3 6.207D-05 1.948D-05
> 3.642D-01
> >
> > This is for the included tutorial_global_oce_optim running for 1 year,
> and with mult_hflux_tut set to 0.2 rather than 2.
> >
> > To be honest, I'm not even sure what numbers are being printed, but I
> suspect one of those first two numbers is the step size multiplier. I.e.
> it's trying to take smaller and smaller steps, but these are still being
> rejected.
> >
> > I'm about to dive in with gdb and see what's going on, but my hypothesis
> is that the second Wolfe test is being violated. Roughly speaking, this
> forces the gradient to decrease by 10% each iteration (at least, the
> component of the gradient in the descent direction). This makes sense once
> the algorithm is in the basin near the minimizer of the cost function, but
> there's no apriori reason for it to hold further away.
> >
> > Is there likely to be anything wrong with modifying this check to allow
> (some) gradient steepening? I guess I'll find out...
> > _______________________________________________
> > MITgcm-support mailing list
> > MITgcm-support at mitgcm.org
> > http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20190213/bf45afc1/attachment.html>
More information about the MITgcm-support
mailing list