[MITgcm-support] optim_m1qn3: maximum iterations reached?

Mon Oct 5 08:36:17 EDT 2020

Hello Martin

Deepest apologies for the slow reply to this.

After your email I modified epsg -- i didn't even realise this was a user
parameter. I happens that the precipitating issue was that my initial cost
function was 4 orders of magnitude larger than anticipated (which i hadn't
even noticed, and still don't know why) meaning that the termination
condition was met at an earlier point than when the optimisation was at
what i would consider a "good" solution. And, since i have a lazy script
that does not look at the termination condition, i was simply "restarting"
without realising it, but having nsim or niter set to quite small values.

I guess im a bit agnostic as to whether these parameters (nsim, niter)
should be reset upon restarting the optimisation. I suppose, it would be
nice to do a "warm start" (which i assume is in contrast to a cold start,
where previously calculated gradients are ignored) and still be able to set
these parameters rather than having them be unchangeable and based on
previous iterations? But this is the first time I ran into this issue, and
it was because I did not know i could change epsg... so maybe it is not
worth the trouble?

Best
Dan

On Fri, Sep 25, 2020 at 8:24 AM Martin Losch <Martin.Losch at awi.de> wrote:

> Hi Dan,
>
> I started to modify optim_m1qn3 on github, maybe we can move any further
> discussion to the issue page (
> https://github.com/mjlosch/optim_m1qn3/issues/4)? I’d be happy to hear
> your suggestions.
>
> Martin
>
> > On 24. Sep 2020, at 12:08, Martin Losch <Martin.Losch at awi.de> wrote:
> >
> > Hi Dan,
> >
> > I think this is only related to your script (and the absense of a
> stopping condition), see below:
> >
> >> On 24. Sep 2020, at 11:21, Daniel Goldberg <dan.goldberg at ed.ac.uk>
> wrote:
> >>
> >> Hi Martin
> >>
> >> Following up from my email to support, i attach the m1qn3 text output
> as it is too large to include in a message list email:
> >>
> >> I attach the cshell script that i have been using to call optim_m1qn3
> and mitgcmuv_ad. As you can see there are no checks for termination.
> >>
> >> m1qn3_output.txt.omode6 is from an optimisation that ends in,
> obviously, omode=6.
> >>
> >> m1qn3_output.txt.omode4 is an optimisation that "ends" with omode=4 --
> but is a bit confusing. now that i look more closely, i think it ends with
> omode=1 (line 450), and then seems to keeps on going as i keep on calling
> m1qn3 -- possibly an unintended cold start? I know that the cost function
> can get quite a bit smaller than it is at this point, so I think continuing
> is the right thing to do.
> > It ends sucessfully with omode=1, because epsg = 1e-6 is satified
> (l453:      realized relative precision on g:  6.59E-07)
> > If you want to get any better you should run this with a smaller epsg
> > On exit nsim and niter are overwritten by m1qn3 to store the actual
> simulations/iterations done in the optimization:
> > m1qn3: output mode is  1
> >     number of iterations:             22
> >     number of simulations:            24
> >     realized relative precision on g:  6.59E-07
> >     f             =  8.44876558E+02
> >     two-norm of g =  2.34699335E+00
> >
> > optim_m1qn3 then saves these numbers to OPWARM … and then you (probably)
> do a warm start with these new values of niter=22 and nsim=24 and then
> m1qn3 stops rightfully after 22 iterations.
> >
> > I am not sure, if we want to make m1qn3 control the loop in the script
> by simply stalling (similar to m1qn3 which stalls when niter=0 later on). I
> think it is necessary to stop the loop in the script when m1qn3 thinks it’s
> done (i.e. by grepping "m1qn3: output mode” in m1qn3_output.txt?).
> > You can then do a cold restart after removing all OPWARM* files.
> >
> > what do you think?
> >
> > Martin
> >
> >>
> >> However, At lines 1352, 1809, and 2266 it seems to terminate with
> omode=4 or 5, which is what initially prompted me to email you; as both the
> number of iterations and number of simulations is a lot smaller than what I
> imagined the maximum would be. Perhaps a symptom of keeping the
> optimisation going after termination? At line 2294, omode is equal to 1 --
> seemingly the result we are after. However, Im not sure if i can trust it,
> given that it comes immediately after an omode=4 termination. Is this all
> consistent with how you know optim_m1qn3 to work?
> >>
> >> Thanks for taking the time to look at this. Apologies for potentially
> using an out of date m1qn3. I am pretty confident i've not modified the
> source though.
> >>
> >> Best
> >> Dan
> >>
> >>
> >> On Thu, Sep 24, 2020 at 9:59 AM Daniel Goldberg <dan.goldberg at ed.ac.uk>
> wrote:
> >> Hi Martin
> >>
> >> Thank you very much for your helpful response. Thank you as well for
> directing me to m1qn3_output.txt -- now i recall your directing me to this
> before, which makes the "trace" of the optimisation much easier to follow.
> >>
> >> I am skeptical as to whether this could be reproduced with a simpler
> run. I am using optim_m1qn3 at the moment to carry out an inversion of
> surface properties to yield basal properties with the package STREAMICE. I
> have done two separate optimisations, both very similar but using slightly
> different resolutions (1 km vs 1500m); and the slightly coarser simulation
> seemed to terminate in a slightly better state (omode=6; not the omode=1 i
> would hope for, but with a cost function progression i would expect for
> this type of problem).
> >>
> >> It is very possible I am continuing the optimisation past a point where
> i should stop it, as out of pure laziness and slight ignorance i don't have
> appropriate termination conditions in the calling shell script.
> >>
> >> It is also somewhat possible I have not pulled from git recently
> enough, or even that i have unwittingly changed a source file; and it has
> been so long since i cloned from git (copying the source for each
> experiment) i cannot even recall where the cloned repo is, if i still have
> it. So this is something I can check. I can also re-run the optimisation
> with the debug message as you suggest, but this will take some time as the
> experiment is run on a supercomputer with long queues during the day.
> >>
> >> Perhaps the best thing is for me to send you m1qn3_output.txt for both
> experiments (via direct email as they are large) as well as my calling
> script. After this i will try cloning from the source and rerunning the
> optimisation (which will be queued for some time) and see if it does the
> same.
> >>
> >> Best
> >> Dan
> >>
> >> On Thu, Sep 24, 2020 at 8:49 AM Martin Losch <Martin.Losch at awi.de>
> wrote:
> >> Hi Dan,
> >>
> >> thanks for using this routine. Do you think that we can reproduce this
> somewhat odd behavior with a simple optimization (i.e. with a cheap
> costfunction like the “testbed” in optim_m1qn3)?
> >>
> >> nsim = numter*nfunc should not change during the optimization, only at
> the (successful) end, it is overwritten somewhere in m1qn3a or so to store
> the actual number of simulations. So my only guess is that you
> (accidentally) restart the optimization (with modified parameters) after
> m1qn3 thinks it’s over?
> >>
> >> Maybe it would be helpful to have look at the output of optim_m1qn3,
> but also of m1qn3 itself (if you didn’t change it:
> fname_m1qn3='output_m1qn3.txt’), but also at the calling seqence (script).
> Depending on the size, you can send it directly to me.
> >>
> >> Martin
> >>
> >>> On 24. Sep 2020, at 09:26, Daniel Goldberg <dan.goldberg at ed.ac.uk>
> wrote:
> >>>
> >>> Hi Martin
> >>>
> >>> I am using optim_m1qn3 (installed from your github repo).
> >>>
> >>> I have been using it in optimisations with data.optim parameters as
> follows:
> >>>
> >>> &OPTIM
> >>> optimcycle=0,
> >>> numiter=1000,
> >>> nfunc=10,
> >>> dfminfrac=0.001,
> >>> iprint=10,
> >>> nupdate=5,
> >>> /
> >>>
> >>> and am seeing the optimisation terminate with omode=5 after about 100
> iterations. The manual for m1qn3 suggests the maximum number of simulations
> has been reached, which is a lot less than numiter*nfunc, which (according
> to the git readme) is the max number of simulations. (Though i do note that
> "nsim" in the text output from the optim_m1qn3 executable changes at some
> point in the optimisation from its initial value of 10000.)
> >>>
> >>> I then ran the optimisation again with nfunc=20 to see what would
> happen -- this time there is a termination with omode=4, the maximum number
> of iterations being reached -- this happened at optimcycle=116. Here, i
> notice that each subsequent optimcycle gives omode=1 in the optim_m1qn3
> output -- im not sure if this is significant.
> >>>
> >>> I was wondering if this behaviour makes sense to you, and if you would
> be able to explain it? Happy to provide more output from optim_m1qn3 (from
> the more recent optimisation with nfunc=20; the previous nfunc=10 output is
> deleted).
> >>>
> >>> Many thanks
> >>> Dan
> >>>
> >>> --
> >>> --- PLEASE NOTE THAT I AM CURRENTLY WORKING FROM HOME AS A MEASURE OF
> SOCIAL DISTANCING DURING THE COVID-19 PANDEMIC ---
> >>>
> >>> Daniel Goldberg, PhD
> >>> Reader in Glaciology
> >>> School of Geosciences, University of Edinburgh
> >>> Geography Building, Drummond Street, Edinburgh EH8 9XP
> >>>
> >>>
> >>> em: dan.goldberg at ed.ac.uk
> >>> web: https://www.geos.ed.ac.uk/homes/dgoldber
> >>> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> >>> _______________________________________________
> >>> MITgcm-support mailing list
> >>> MITgcm-support at mitgcm.org
> >>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> >>
> >> _______________________________________________
> >> MITgcm-support mailing list
> >> MITgcm-support at mitgcm.org
> >> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> >>
> >>
> >> --
> >> --- PLEASE NOTE THAT I AM CURRENTLY WORKING FROM HOME AS A MEASURE OF
> SOCIAL DISTANCING DURING THE COVID-19 PANDEMIC ---
> >>
> >> Daniel Goldberg, PhD
> >> Reader in Glaciology
> >> School of Geosciences, University of Edinburgh
> >> Geography Building, Drummond Street, Edinburgh EH8 9XP
> >>
> >>
> >> em: dan.goldberg at ed.ac.uk
> >> web: https://www.geos.ed.ac.uk/homes/dgoldber
> >> <opt_script.csh><m1qn3_output.txt.omode6><m1qn3_output.txt.omode4>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>

-- 
--- PLEASE NOTE THAT I AM CURRENTLY WORKING FROM HOME AS A MEASURE OF
SOCIAL DISTANCING DURING THE COVID-19 PANDEMIC ---

Daniel Goldberg, PhD
Reader in Glaciology
School of Geosciences, University of Edinburgh
Geography Building, Drummond Street, Edinburgh EH8 9XP

em: dan.goldberg at ed.ac.uk
web: https://www.geos.ed.ac.uk/homes/dgoldber
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20201005/c94f12ab/attachment-0001.html>