[MITgcm-support] tutorial_global_oce_optim optimisation failed

Thu May 3 04:46:23 EDT 2018

Hi Andrew,

the FD gradient is used for checking the AD gradient in a few (very few!) places. I don’t know why it is zero in your case, but assuming that the AD gradient is correct, you don’t need the FD gradient at all (I would actually strongly recommend to turn off the grdchk pkg for any optimization exercise).

make sure you do a “cvs update" on the optim_m1qn3 directory, because I added a fix for the funny cost function value yesterday.

Martin

> On 2. May 2018, at 19:34, Andrew McRae <andrew.mcrae at physics.ox.ac.uk> wrote:
> 
> Thanks for this.
> 
> Just as a sanity check, before I involve optim_m1qn3 again, the output of my ./testreport -t tutorial_global_oce_optim -oad includes
> 
> There were 16 decimal places of similarity for "ADM CostFct"
> There were 16 decimal places of similarity for "ADM Ad Grad"
> There were 0 decimal places of similarity for "ADM FD Grad"
> 
> Should I be concerned about this?
> 
> E.g. lines 2116-2118 of my output_oadm.txt file are
> 
> (PID.TID 0000.0001)  ADM  ref_cost_function      =  6.20023228182329E+00
> (PID.TID 0000.0001)  ADM  adjoint_gradient       = -2.69091500991183E-06
> (PID.TID 0000.0001)  ADM  finite-diff_grad       =  0.00000000000000E+00
> 
> But at least my cost function value is the same:
> 
> (PID.TID 0000.0001)   local fc =  0.620023228182329D+01
> (PID.TID 0000.0001)  global fc =  0.620023228182329D+01
> 
> Andrew
> 
> On 2 May 2018 at 10:34, Martin Losch <Martin.Losch at awi.de> wrote:
> Hi Andrew,
> 
> I won’t be able to help you much with the optim/lsopt code, because I would have to get it running again myself. But I do recommend using the MITgcm_contrib/mlosch/optim_m1qn3 code. It’s not very well documented, but I am attaching a skeleton script to illustrate how to use it. Please give it a try and if you find it useful, I can add this script to the repository.
> 
> The two versions of the optimization routine are similar, both implement the same optimization algorithm (BFGS), but optim_m1qn3 uses a later version of the m1qn3 code, I think it’s easier to compile (only one Makefile) and I believe (but there’s debate about this) that it does the right thing as opposed to the optim/lsopt variant, which somehow truncates the optimization in each iteration. Having said that, I have used both in parallel, and the reduction of the cost function (which is really all we care about) is sometimes better with the optim_m1qn3 code, sometimes it is better with the optim/lsopt code. The optim_m1qn3 code is closer to the idea of the original m1qn3 code.
> 
> Let me know if you can use my attached instructions.
> 
> Martin
> 
> 
> 
> > On 1. May 2018, at 00:00, Andrew McRae <andrew.mcrae at physics.ox.ac.uk> wrote:
> > 
> > Right, but the cost function is the same value each time, the norm of x is 0 each time, and the norm of g is the same each time.  This suggests nothing is happening.  It's a bit ridiculous that one of the core tutorials simply isn't working out of the box...
> > 
> > I will have a go at debugging.
> > 
> > Andrew
> > 
> > On 30 April 2018 at 22:54, Matthew Mazloff <mmazloff at ucsd.edu> wrote:
> > Well you are correct that its not actually taking a step because the dot product of the control is 0:
> >>> norm of x................... 0.00000000E+00
> > meaning the controls are all 0 still.
> > 
> > However the gradients are non-zero
> >>> norm of g................... 0.12730927E-01
> > so the linesearch should step and 
> > ecco_ctrl_MIT_CE_000.opt0001 
> > should not be all zero. 
> > 
> > To debug this you could put a print statement in optim_writedata.F to see what it is writing…..
> > 
> > I don’t know enough about this tutorial to be a bigger help, sorry
> > 
> > Matt
> > 
> > 
> >> On Apr 30, 2018, at 2:50 PM, Andrew McRae <andrew.mcrae at physics.ox.ac.uk> wrote:
> >> 
> >> Yes, I did.
> >> 
> >> On 30 April 2018 at 22:42, Matthew Mazloff <mmazloff at ucsd.edu> wrote:
> >> This is still iteration 0. You have to update data.optim to tell it you are now at iteration 1
> >> 
> >> Matt
> >> 
> >> 
> >>> On Apr 30, 2018, at 2:38 PM, Andrew McRae <andrew.mcrae at physics.ox.ac.uk> wrote:
> >>> 
> >>> I tried a few steps of this, but the output of optim.x always has
> >>> 
> >>>   cost function............... 0.62002323E+01
> >>>   norm of x................... 0.00000000E+00
> >>>   norm of g................... 0.12730927E-01
> >>> 
> >>> near the end, with no decrease in the cost function.  So I guess it's not actually taking the step?
> >>> 
> >>> Andrew
> >>> 
> >>> On 27 April 2018 at 18:04, Andrew McRae <andrew.mcrae at physics.ox.ac.uk> wrote:
> >>> !!!  Okay...
> >>> 
> >>> Yes, it produced the .opt0001 file.  I'll see how this goes.
> >>> 
> >>> Thanks,
> >>> Andrew
> >>> 
> >>> On 27 April 2018 at 17:57, Matthew Mazloff <mmazloff at ucsd.edu> wrote:
> >>> Hello
> >>> 
> >>> Its been awhile, but I am pretty sure that is the normal output. It says “fail", but it did give you a new and ecco_ctrl_MIT_CE_000.opt0001 (correct?) and if you unpack and run likely the cost will descend.
> >>> 
> >>> I think it worked correctly. lsopt/optim are just confusing…but I think its working. I think all is good!
> >>> 
> >>> Matt
> >>> 
> >>> 
> >>> 
> >>>> On Apr 27, 2018, at 8:25 AM, Andrew McRae <andrew.mcrae at physics.ox.ac.uk> wrote:
> >>>> 
> >>>> Just separating this from the other thread, I got the bundled MITgcm optim routine built (having made these changes, based on this thread from 2010 and this one from 2016).
> >>>> 
> >>>> I use OpenAD to create the adjoint.
> >>>> 
> >>>> My steps are:
> >>>> 1) in the build directory, run ../../../tools/genmake2 -oad -mods=../code_oad
> >>>> 2) run make depend and make adAll
> >>>> 3) copy input_oad/ into a new folder scratch/
> >>>> 4) within scratch/, run ./prepare_run
> >>>> 5) copy mitgcmuv_ad from build/ into scratch/, copy optim.x into scratch/OPTIM/
> >>>> 6) run ./mitgcmuv_ad
> >>>> 7) in scratch/OPTIM, create symlinks to ../data.optim and ../data.ctrl
> >>>> 8) copy the files ecco_cost_MIT_CE_000.opt0000 and ecco_ctrl_MIT_CE_000.opt0000 into the OPTIM subdirectory
> >>>> 9) run ./optim.x within the subdirectory
> >>>> 
> >>>> The full output is attached, but I assume the optimisation failed since the last lines are
> >>>> 
> >>>>   optimization stopped because :
> >>>>   ifail =   4    the search direction is not a descent one
> >>>> 
> >>>> Any ideas?  (I guess this isn't something that is tested in the daily builds?)
> >>>> 
> >>>> In the meantime, I'll try the m1qn3 routine as in the other thread, which should help distinguish between a problem with the optimisation routine or the gradient generated by mitgcmuv_ad.
> >>>> 
> >>>> Andrew
> >>>> <out.txt>_______________________________________________
> >>>> MITgcm-support mailing list
> >>>> MITgcm-support at mitgcm.org
> >>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> >>> 
> >>> 
> >>> 
> >>> _______________________________________________
> >>> MITgcm-support mailing list
> >>> MITgcm-support at mitgcm.org
> >>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> >> 
> >> 
> >> _______________________________________________
> >> MITgcm-support mailing list
> >> MITgcm-support at mitgcm.org
> >> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> > 
> > 
> > _______________________________________________
> > MITgcm-support mailing list
> > MITgcm-support at mitgcm.org
> > http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support