[MITgcm-support] optim_m1qn3: maximum iterations reached?

Martin Losch Martin.Losch at awi.de
Fri Sep 25 03:23:33 EDT 2020


Hi Dan,

I started to modify optim_m1qn3 on github, maybe we can move any further discussion to the issue page (https://github.com/mjlosch/optim_m1qn3/issues/4)? I’d be happy to hear your suggestions.

Martin

> On 24. Sep 2020, at 12:08, Martin Losch <Martin.Losch at awi.de> wrote:
> 
> Hi Dan,
> 
> I think this is only related to your script (and the absense of a stopping condition), see below:
> 
>> On 24. Sep 2020, at 11:21, Daniel Goldberg <dan.goldberg at ed.ac.uk> wrote:
>> 
>> Hi Martin
>> 
>> Following up from my email to support, i attach the m1qn3 text output as it is too large to include in a message list email:
>> 
>> I attach the cshell script that i have been using to call optim_m1qn3 and mitgcmuv_ad. As you can see there are no checks for termination.
>> 
>> m1qn3_output.txt.omode6 is from an optimisation that ends in, obviously, omode=6. 
>> 
>> m1qn3_output.txt.omode4 is an optimisation that "ends" with omode=4 -- but is a bit confusing. now that i look more closely, i think it ends with omode=1 (line 450), and then seems to keeps on going as i keep on calling m1qn3 -- possibly an unintended cold start? I know that the cost function can get quite a bit smaller than it is at this point, so I think continuing is the right thing to do.
> It ends sucessfully with omode=1, because epsg = 1e-6 is satified (l453:      realized relative precision on g:  6.59E-07) 
> If you want to get any better you should run this with a smaller epsg
> On exit nsim and niter are overwritten by m1qn3 to store the actual simulations/iterations done in the optimization:
> m1qn3: output mode is  1
>     number of iterations:             22
>     number of simulations:            24
>     realized relative precision on g:  6.59E-07
>     f             =  8.44876558E+02
>     two-norm of g =  2.34699335E+00
> 
> optim_m1qn3 then saves these numbers to OPWARM … and then you (probably) do a warm start with these new values of niter=22 and nsim=24 and then m1qn3 stops rightfully after 22 iterations.
> 
> I am not sure, if we want to make m1qn3 control the loop in the script by simply stalling (similar to m1qn3 which stalls when niter=0 later on). I think it is necessary to stop the loop in the script when m1qn3 thinks it’s done (i.e. by grepping "m1qn3: output mode” in m1qn3_output.txt?).
> You can then do a cold restart after removing all OPWARM* files.
> 
> what do you think?
> 
> Martin
> 
>> 
>> However, At lines 1352, 1809, and 2266 it seems to terminate with omode=4 or 5, which is what initially prompted me to email you; as both the number of iterations and number of simulations is a lot smaller than what I imagined the maximum would be. Perhaps a symptom of keeping the optimisation going after termination? At line 2294, omode is equal to 1 -- seemingly the result we are after. However, Im not sure if i can trust it, given that it comes immediately after an omode=4 termination. Is this all consistent with how you know optim_m1qn3 to work?
>> 
>> Thanks for taking the time to look at this. Apologies for potentially using an out of date m1qn3. I am pretty confident i've not modified the source though.
>> 
>> Best
>> Dan
>> 
>> 
>> On Thu, Sep 24, 2020 at 9:59 AM Daniel Goldberg <dan.goldberg at ed.ac.uk> wrote:
>> Hi Martin
>> 
>> Thank you very much for your helpful response. Thank you as well for directing me to m1qn3_output.txt -- now i recall your directing me to this before, which makes the "trace" of the optimisation much easier to follow.
>> 
>> I am skeptical as to whether this could be reproduced with a simpler run. I am using optim_m1qn3 at the moment to carry out an inversion of surface properties to yield basal properties with the package STREAMICE. I have done two separate optimisations, both very similar but using slightly different resolutions (1 km vs 1500m); and the slightly coarser simulation seemed to terminate in a slightly better state (omode=6; not the omode=1 i would hope for, but with a cost function progression i would expect for this type of problem).
>> 
>> It is very possible I am continuing the optimisation past a point where i should stop it, as out of pure laziness and slight ignorance i don't have appropriate termination conditions in the calling shell script. 
>> 
>> It is also somewhat possible I have not pulled from git recently enough, or even that i have unwittingly changed a source file; and it has been so long since i cloned from git (copying the source for each experiment) i cannot even recall where the cloned repo is, if i still have it. So this is something I can check. I can also re-run the optimisation with the debug message as you suggest, but this will take some time as the experiment is run on a supercomputer with long queues during the day. 
>> 
>> Perhaps the best thing is for me to send you m1qn3_output.txt for both experiments (via direct email as they are large) as well as my calling script. After this i will try cloning from the source and rerunning the optimisation (which will be queued for some time) and see if it does the same.
>> 
>> Best
>> Dan
>> 
>> On Thu, Sep 24, 2020 at 8:49 AM Martin Losch <Martin.Losch at awi.de> wrote:
>> Hi Dan,
>> 
>> thanks for using this routine. Do you think that we can reproduce this somewhat odd behavior with a simple optimization (i.e. with a cheap costfunction like the “testbed” in optim_m1qn3)?
>> 
>> nsim = numter*nfunc should not change during the optimization, only at the (successful) end, it is overwritten somewhere in m1qn3a or so to store the actual number of simulations. So my only guess is that you (accidentally) restart the optimization (with modified parameters) after m1qn3 thinks it’s over?
>> 
>> Maybe it would be helpful to have look at the output of optim_m1qn3, but also of m1qn3 itself (if you didn’t change it: fname_m1qn3='output_m1qn3.txt’), but also at the calling seqence (script). Depending on the size, you can send it directly to me.
>> 
>> Martin
>> 
>>> On 24. Sep 2020, at 09:26, Daniel Goldberg <dan.goldberg at ed.ac.uk> wrote:
>>> 
>>> Hi Martin
>>> 
>>> I am using optim_m1qn3 (installed from your github repo).
>>> 
>>> I have been using it in optimisations with data.optim parameters as follows:
>>> 
>>> &OPTIM
>>> optimcycle=0,
>>> numiter=1000,
>>> nfunc=10,
>>> dfminfrac=0.001,
>>> iprint=10,
>>> nupdate=5,
>>> /
>>> 
>>> and am seeing the optimisation terminate with omode=5 after about 100 iterations. The manual for m1qn3 suggests the maximum number of simulations has been reached, which is a lot less than numiter*nfunc, which (according to the git readme) is the max number of simulations. (Though i do note that "nsim" in the text output from the optim_m1qn3 executable changes at some point in the optimisation from its initial value of 10000.) 
>>> 
>>> I then ran the optimisation again with nfunc=20 to see what would happen -- this time there is a termination with omode=4, the maximum number of iterations being reached -- this happened at optimcycle=116. Here, i notice that each subsequent optimcycle gives omode=1 in the optim_m1qn3 output -- im not sure if this is significant. 
>>> 
>>> I was wondering if this behaviour makes sense to you, and if you would be able to explain it? Happy to provide more output from optim_m1qn3 (from the more recent optimisation with nfunc=20; the previous nfunc=10 output is deleted).
>>> 
>>> Many thanks
>>> Dan
>>> 
>>> -- 
>>> --- PLEASE NOTE THAT I AM CURRENTLY WORKING FROM HOME AS A MEASURE OF SOCIAL DISTANCING DURING THE COVID-19 PANDEMIC ---
>>> 
>>> Daniel Goldberg, PhD
>>> Reader in Glaciology
>>> School of Geosciences, University of Edinburgh
>>> Geography Building, Drummond Street, Edinburgh EH8 9XP
>>> 
>>> 
>>> em: dan.goldberg at ed.ac.uk
>>> web: https://www.geos.ed.ac.uk/homes/dgoldber
>>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
>>> _______________________________________________
>>> MITgcm-support mailing list
>>> MITgcm-support at mitgcm.org
>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>> 
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>> 
>> 
>> -- 
>> --- PLEASE NOTE THAT I AM CURRENTLY WORKING FROM HOME AS A MEASURE OF SOCIAL DISTANCING DURING THE COVID-19 PANDEMIC ---
>> 
>> Daniel Goldberg, PhD
>> Reader in Glaciology
>> School of Geosciences, University of Edinburgh
>> Geography Building, Drummond Street, Edinburgh EH8 9XP
>> 
>> 
>> em: dan.goldberg at ed.ac.uk
>> web: https://www.geos.ed.ac.uk/homes/dgoldber
>> <opt_script.csh><m1qn3_output.txt.omode6><m1qn3_output.txt.omode4>



More information about the MITgcm-support mailing list