[MITgcm-support] diagnosing problems with the adjoint

Thu Aug 13 20:56:42 EDT 2009

I believe I have gotten both of these setups working now thanks to the  
advice from the list -- thanks!  Thought I'd report on which fixes  
worked in case others face the same ...

tutorial_global_oce_optim:
	I can get a reduction in cost over at least 2 iterations with a  
clean, current (Aug. 10) checkout of the GCM with the following  
changes: nTimeSteps = 360 in data and lastinterval = 31104000 in  
data.cost; these changes are to be expected as the tutorial as checked  
in to the CVS must be very fast for testing purposes.  Cost only goes  
down for 1 iteration if one uses synchronous time stepping at 1800  
sec.  I had thought asynchronous time-stepping was problematic for the  
adjoint, but it seems to work in this case.

my north atlantic setup:
	I now have an adjoint that does not blow up and can get a reduction  
in cost.  I followed Matt's advice to use autodiff_inadmode_set_ad.F  
and autodiff_inadmode_unset_ad.F to turn off packages KPP, Ptracers,  
GMRedi, and Sea Ice for the adjoint stage and I set  
multiDimAdvection=.FALSE. in the data file.  I had thought having  
different packages in forward and adjoint modes was supposed to be  
problematic in terms of interpreting the results, but regardless I'm  
really grateful to have a version that works.

I'm still interested in any scripts you may have that I could build  
off of for diagnosing problems with the adjoint in time; sending them  
off list might make sense.

Thanks to all for the help -
Holly

On Aug 12, 2009, at Aug 12 , 1:57 PM, Holly Dail wrote:

> The confusion may be coming from the fact that the discussion was of  
> two independent setups --
>
> (1) my North Atlantic setup (for debugging purposes running for 1  
> month at timestep of 3600 sec)
> problem: adxx values are on the order of 10^16 and optim.x crashes
>
> (2) a clean checkout of tutorial_global_oce_optim with only 2  
> changes (synchronous time stepping at 1800 sec and 1 year execution)
> problem: with a current version of the GCM and tutorial, cost  
> reduction fails at iteration 1
> [David was able to see cost reductions over 10+ iterations when he  
> ran this 1 year ago and I've tested with a March version of the GCM  
> and had better results too]
>
> For (1) I am trying Matt's advice on turning off packages.  I could  
> also move to a faster machine and run for 1 year if that might help.
> For (2) I can try running with asynchronous time-stepping, though I  
> thought it was inadvisable for the adjoint
>
> The west coast is always the right coast, which means I'm on the  
> wrong coast!
> Holly
>
>
> On Aug 12, 2009, at Aug 12 , 1:30 PM, Patrick Heimbach wrote:
>
>>
>> Holly,
>>
>> I am a bit confused now on what works in your setup and what doesn't.
>> Maybe rather than sensitivities blowing up, there's a bug in the  
>> code(?)
>>
>> David's suggestion of starting from a clean tutorial setup is a  
>> good one.
>> Also indicate over which time span you run (20 timesteps, a year,  
>> 10 years?).
>> Finally, try doing a tangent linear test, to see whether there are  
>> problems
>> with store directives.
>>
>> Sorry for being at the wrong coast right now.
>>
>> -p.
>>
>> On Aug 12, 2009, at 1:17 PM, Holly Dail wrote:
>>
>>> I don't know how to check the testreport, but I did run the  
>>> tutorial.  I had to make two changes -- I switched to synchronous  
>>> time stepping (1800 sec) based on Patrick's advice, and I set the  
>>> tutorial to run for a year.
>>>
>>> The adjoint doesn't blow up, but I am still struggling with some  
>>> inconsistencies.
>>> - if I use a March 2009 version of the GCM, optim.x ends with  
>>> iter0 at a cost of 14.66 and iter1 at a cost of 12.04; optim.x  
>>> stops based on maximal number of iterations reached which seems to  
>>> be in line with what you got when you developed this tutorial.
>>>
>>> - if I use a current version of the GCM, optim.x ends with iter0  
>>> again at a cost of 14.66, but iter1 fails to reduce cost with a  
>>> message 'the search direction is not a descent one'
>>>
>>> I did recompile and retest the March version this week to make  
>>> sure this was the case even with fresh compiles of both, and  
>>> indeed it is.  Not sure what could have changed in the GCM; I  
>>> couldn't find anything significant in the tutorial code or config,  
>>> lsopt, optim, or any of the packages I thought to check.
>>>
>>> Thanks,
>>> Holly
>>>
>>> On Aug 12, 2009, at Aug 12 , 12:49 PM, David Ferreira wrote:
>>>
>>>> Holy,
>>>> Just to be sure: does the testreport of tutorial_global_oce_optim  
>>>> run fine  for you ?
>>>> david
>>>>
>>>>
>>>> Holly Dail wrote:
>>>>> Thanks for the advice Matt.
>>>>>
>>>>> I'm not using the divided adjoint, but I'll try the  
>>>>> autodiff_inadmode_set.F approach.
>>>>>
>>>>> Here are the viscosities / diffusivities (chosen to be almost  
>>>>> exactly that used in ECCO):
>>>>> viscAz=1.E-3,
>>>>> viscAh=1.E4,
>>>>> diffKhT=100.,
>>>>> diffKzT=2.E-5,
>>>>> diffKhS=100.,
>>>>> diffKzS=1.E-5,
>>>>>
>>>>> I used your advection scheme based on your earlier advice, but  
>>>>> haven't tried
>>>>>> multiDimAdvection=.FALSE.,
>>>>> Will try that too.
>>>>>
>>>>> My time step is 3600 - again same as ECCO.
>>>>>
>>>>> Thanks -
>>>>> Holly
>>>>>
>>>>>
>>>>> On Aug 12, 2009, at Aug 12 , 11:42 AM, Matthew Mazloff wrote:
>>>>>
>>>>>> Hi Holly,
>>>>>>
>>>>>> Your adjoint is definitely blowing up (how many timesteps is  
>>>>>> your grad check....its blowing up fast).   Try turning off  
>>>>>> packages when you run the adjoint and see if that helps.  Are  
>>>>>> you using the divided adjoint?  If so you can just change some  
>>>>>> things to false in data.pkg when its about to start.  Turn off  
>>>>>> KPP and GMREDI and packages of that nature.  If you are not  
>>>>>> using the divided adjoint then you have to use  
>>>>>> autodiff_inadmode_set.F to turn these things off.  In this file  
>>>>>> just set
>>>>>>   usePtracers  = .FALSE.
>>>>>>   useKPP = .FALSE.
>>>>>>   useGMREDI = .FALSE.
>>>>>>   useSEAICE = .FALSE.
>>>>>>
>>>>>> Then try again
>>>>>>
>>>>>> -Matt
>>>>>>
>>>>>> ps> out of curiosity, what viscosity and diffusivity are you  
>>>>>> trying to run the adjoint with?
>>>>>>
>>>>>> Oh, and also some of the advection schemes may not be stable.   
>>>>>> I am using
>>>>>> multiDimAdvection=.FALSE.,
>>>>>> tempAdvScheme=30,
>>>>>> saltAdvScheme=30,
>>>>>>
>>>>>>
>>>>>> pps> of course the real expert is just upstairs from you -- bug  
>>>>>> him :o)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Aug 12, 2009, at 8:14 AM, Holly Dail wrote:
>>>>>>
>>>>>>> Hello all -
>>>>>>>
>>>>>>> I'd like to use optimization with a regional North Atlantic  
>>>>>>> setup.  As a first case, I started with the approach laid out  
>>>>>>> in tutorial_global_oce_optim --
>>>>>>> - cost based on (1) divergence of annual mean surface  
>>>>>>> temperatures in the model from climatology and (2) reasonable  
>>>>>>> magnitude of control vector
>>>>>>> - control is a time-mean heat flux correction (2-d field)
>>>>>>>
>>>>>>> My sensitivities are astronomical (i.e. adxx = 10^16), the  
>>>>>>> gradient check seems to fail (as shown below, finite  
>>>>>>> difference gradients seem okay, adjoint gradients not so  
>>>>>>> much), and optim.x fails with message 'the linesearch failed'.
>>>>>>>
>>>>>>> (PID.TID 0000.0001) grdchk output:             
>>>>>>> procId               I        ITIL  EPOS         
>>>>>>> JTILEPOS           LAYER            X(I)      X(I)+/-EPS
>>>>>>> (PID.TID 0000.0001) grdchk output:            FC              
>>>>>>> FC1             FC2 FC1-FC2/(2*EPS)    ADJ GRAD(FC)   1-FDGRD/ 
>>>>>>> ADGRD
>>>>>>> (PID.TID 0000.0001) grdchk output:                  
>>>>>>> 0               1             56              35                
>>>>>>> 1 0.000000000D+00 -.100000000D+00
>>>>>>> (PID.TID 0000.0001) grdchk output:                    
>>>>>>> 0.261232434D+02 0.261232444     D+02 0.261232340D+02  
>>>>>>> 0.523051129D-04 -.115313924+108 0.100000000D+01
>>>>>>>
>>>>>>> I suppose this may mean the adjoint is blowing up?  I've tried  
>>>>>>> reducing my time step and increasing viscosity and I checked  
>>>>>>> that my climatology & error fields are defined at all wet  
>>>>>>> points; are there other fixes folks have had success with?   
>>>>>>> Also if you have scripts that you use to diagnose your  
>>>>>>> optimization runs that would be really appreciated.
>>>>>>>
>>>>>>> Thanks -
>>>>>>> Holly
>>>>>>> _______________________________________________
>>>>>>> MITgcm-support mailing list
>>>>>>> MITgcm-support at mitgcm.org
>>>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>>>>>
>>>>>> _______________________________________________
>>>>>> MITgcm-support mailing list
>>>>>> MITgcm-support at mitgcm.org
>>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>>>>
>>>>> _______________________________________________
>>>>> MITgcm-support mailing list
>>>>> MITgcm-support at mitgcm.org
>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>>>
>>>> _______________________________________________
>>>> MITgcm-support mailing list
>>>> MITgcm-support at mitgcm.org
>>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>>
>>> _______________________________________________
>>> MITgcm-support mailing list
>>> MITgcm-support at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>
>> ---
>> Patrick Heimbach | heimbach at mit.edu | http://www.mit.edu/~heimbach
>> MIT | EAPS 54-1518 | 77 Massachusetts Ave | Cambridge MA 02139 USA
>> FON +1-617-253-5259 | FAX +1-617-253-4464 | SKYPE patrick.heimbach
>>
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support