[MITgcm-devel] [MITgcm-support] MITgcm-support Digest, Vol 141, Issue 6

Wed Mar 4 12:06:03 EST 2015

Hi,

a quick note also from me.
I didn’t quite understand where this discussion was going,
especially given that in his simulation he had set debugLevel=-1.
Seems like the first thing to do (before doing anything fancy) 
if you’re trying to debug something or not sure what you’re doing 
is *not* to set that to -1.

For the default debug level (+1) you can always invert for :) the number of timesteps via
grep ‘cg2d:’ output.txt | wc

as long as the computer that you’re running on is not buffering output
(some IBM’s like to do that).

-Patrick

On Mar 4, 2015, at 10:48 AM, Jean-Michel Campin <jmc at ocean.mit.edu> wrote:

> Hi Martin,
> 
> switching to mitgcm-devel:
> 
> I am not completly sure on how useful this is:
> 1) still rely on catching it at monFreq, meaning might get NaN before
> mon_solution can do anything.
> 2) when Temp range is over the limit (1000.K for ocean set-up), it is
>  sometimes too late to detect where the problem comes from.
> 3) with Non-Lin free-surf, will generally stop before Temp is over limit
>  (because of too small hFac), so not very clear where and at which condition
>   to put this in the code.
> Regarding when to decide to stop, there have been other idea around 
> (using some global sum/ global max from cg2d - checking CFL values ?) 
> but not sure how safe and precise this would work.
> But even with a better criteria, not sure how to diagnose where the problem
> comes from without re-running to get a pickup close enough (but not too close)
> to the stop and then do a short run with plenty of output.
> 
> Cheers,
> Jean-Michel
> 
> On Wed, Mar 04, 2015 at 08:56:34AM +0100, Martin Losch wrote:
>> Thanks Paul,
>> 
>> this or something like this might be a useful addition to the main repository. I’ll think about this (when I get some time to breathe).
>> 
>> Martin
>> 
>>> On 03 Mar 2015, at 19:24, Holland, Paul R. <pahol at bas.ac.uk> wrote:
>>> 
>>> Hi Jonny, Martin, etc.
>>> 
>>> The monumentally useful hack in question was written by Nicolas Bruneau back when he was still cheap enough to work for me.  All you do is edit mon_solution.F so that when it discovers a crash it dumps all the fields you like into a file named with the crash timestep.  I just run everything with this on, just in case.  This still requires you to determine the crash by monitoring so subject to all Martin's comments below.  See code snippet below.
>>> 
>>> Cheers,
>>> 
>>> Paul
>>> 
>>> *** snippets from hacked mon_solution.F ***
>>> 
>>> ...
>>> 
>>> C     !LOCAL VARIABLES:
>>>     CHARACTER*(MAX_LEN_MBUF) msgBuf
>>>     CHARACTER*(MAX_LEN_MBUF) suff
>>>     _RL tMin,tMax,tMean,tSD,tDel2,tVol
>>> 
>>> ...
>>> 
>>>     IF ( (tMax-tMin).GT.monSolutionMaxRange ) THEN
>>>       _BEGIN_MASTER(myThid)
>>>       WRITE(msgBuf,'(A,1P2E11.3)')
>>>    &    'SOLUTION IS HEADING OUT OF BOUNDS: tMin,tMax=',tMin,tMax
>>>       CALL PRINT_MESSAGE(msgBuf,errorMessageUnit,SQUEEZE_RIGHT,myThid)
>>>       WRITE(msgBuf,'(2A,1PE11.3,A)') '  exceeds allowed range ',
>>>    &             '(monSolutionMaxRange=', monSolutionMaxRange,')'
>>>       CALL PRINT_MESSAGE(msgBuf,errorMessageUnit,SQUEEZE_RIGHT,myThid)
>>>       WRITE(msgBuf,'(A,I10)')
>>>    &    'MON_SOLUTION: STOPPING CALCULATION at Iter=', myIter
>>>       CALL PRINT_MESSAGE(msgBuf,errorMessageUnit,SQUEEZE_RIGHT,myThid)
>>>       _END_MASTER(myThid)
>>> 
>>>         WRITE(suff,'(I10.10)') myIter
>>>         CALL
>>>    &   WRITE_FLD_XYZ_RL('stateThetacrash.',suff,theta,myIter,myThid)
>>>         CALL
>>>    &   WRITE_FLD_XYZ_RL('stateSaltcrash.',suff,salt,myIter,myThid)
>>>         CALL
>>>    &   WRITE_FLD_XY_RL('stateEtacrash.',suff,etaN,myIter,myThid)
>>>         CALL
>>>    &   WRITE_FLD_XYZ_RL('stateUvelcrash.',suff,uVel,myIter,myThid)
>>>         CALL
>>>    &   WRITE_FLD_XYZ_RL('stateVvelcrash.',suff,vVel,myIter,myThid)
>>>         CALL
>>>    &   WRITE_FLD_XYZ_RL('stateWvelcrash.',suff,wVel,myIter,myThid)
>>> 
>>>       CALL ALL_PROC_DIE( myThid )
>>>       STOP
>>>    &  'ABNORMAL END: S/R MON_SOLUTION, stops due to EXTREME Pot.Temp'
>>>     ENDIF
>>> 
>>> *** end snippet ***
>>> 
>>> 
>>> ----------------------------------------------------------------------
>>> 
>>> Message: 1
>>> Date: Tue, 3 Mar 2015 15:16:34 +0100
>>> From: Martin Losch <Martin.Losch at awi.de>
>>> To: MITgcm Support <mitgcm-support at mitgcm.org>
>>> Subject: Re: [MITgcm-support] how many timesteps?
>>> Message-ID: <DE6306E6-B9A7-4965-B34A-D56550781479 at awi.de>
>>> Content-Type: text/plain; charset="utf-8"
>>> 
>>> Hi there,
>>> 
>>> the monitor package is indeed quite useful, but it will only give you the statistics of a snapshot every ?monitorFreq? seconds. So if you time step is deltaT=100 and your monitorFreq = 1000, then you?ll get output every 10 time steps. If the model dies in-between, there will be no extra output, because the monitor package will be called only every 10 time steps. Setting monitorFreq = 100 or any value < deltaT will give you output every timestep, but will make the model very slow, so it is useful only if you have to run a relatively small amount of time steps before the model crashes.
>>> 
>>> I remember that Paul Holland once implemented a hack to have the model output some stuff before it dies. Paul, maybe you can share this code.
>>> 
>>> Jonny, in any case you?ll have to rerun your model with smaller monitorFreq. The debugLevel doesn?t have anything to do with this.
>>> 
>>> Martin
>>> 
>>>> On 03 Mar 2015, at 13:42, Jonny Williams <Jonny.Williams at bristol.ac.uk> wrote:
>>>> 
>>>> Thanks very much for that Ed
>>>> 
>>>> Unfortunately that output isn't present in my STDOUT* files. I think this is because I have debugMode=.FALSE., in my eedata file and debugLevel=-1, in my data file?
>>>> 
>>>> Cheers
>>>> 
>>>> Jonny
>>>> 
>>>> 
>>>> On 3 March 2015 at 12:22, Edward Doddridge <edward.doddridge at magd.ox.ac.uk> wrote:
>>>> Hi Jonny,
>>>> 
>>>> The STDERR.00* files should show you how many iterations the model ran for before it died. Here's an example from a run that stopped after 20 timesteps.
>>>> 
>>>> (PID.TID 0000.0001) SOLUTION IS HEADING OUT OF BOUNDS: tMin,tMax= -1.255E+03  1.589E+03
>>>> (PID.TID 0000.0001)   exceeds allowed range (monSolutionMaxRange=  1.000E+03)
>>>> (PID.TID 0000.0001) MON_SOLUTION: STOPPING CALCULATION at Iter=        20
>>>> (PID.TID 0000.0001) *** ERROR *** S/R ALL_PROC_DIE: ending the run
>>>> 
>>>> If you want to monitor how a run is going before it dies, the monitor package is the easiest way I know of.  "monitorFreq" in PARAM03 in the data file sets the frequency of the output. It's a very cheap way to dump data regularly. If you know deltaT and the monitorFreq, then you can work out the number of iterations the model has done.
>>>> 
>>>> Best,
>>>> Ed
>>>> 
>>>> ________________________________
>>>> Edward Doddridge
>>>> Doctoral Student
>>>> Atmospheric, Oceanic and Planetary Physics University of Oxford
>>>> 
>>>> www.doddridge.me
>>> 
>>> 
>>> ________________________________
>>> This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system.
>>> ________________________________
>>> 
>>> _______________________________________________
>>> MITgcm-support mailing list
>>> MITgcm-support at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>> 
>> 
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
> 
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel

---
Patrick Heimbach | heimbach at mit.edu | http://www.mit.edu/~heimbach
MIT | EAPS 54-1420 | 77 Massachusetts Ave | Cambridge MA 02139 USA
FON +1-617-253-5259 | FAX +1-617-253-4464 | SKYPE patrick.heimbach

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1588 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-devel/attachments/20150304/b7fb1a88/attachment.p7s>