[MITgcm-support] MITgcm-support Digest, Vol 141, Issue 6

Holland, Paul R. pahol at bas.ac.uk
Tue Mar 3 13:24:16 EST 2015


Hi Jonny, Martin, etc.

The monumentally useful hack in question was written by Nicolas Bruneau back when he was still cheap enough to work for me.  All you do is edit mon_solution.F so that when it discovers a crash it dumps all the fields you like into a file named with the crash timestep.  I just run everything with this on, just in case.  This still requires you to determine the crash by monitoring so subject to all Martin's comments below.  See code snippet below.

Cheers,

Paul

*** snippets from hacked mon_solution.F ***

...

C     !LOCAL VARIABLES:
      CHARACTER*(MAX_LEN_MBUF) msgBuf
      CHARACTER*(MAX_LEN_MBUF) suff
      _RL tMin,tMax,tMean,tSD,tDel2,tVol

...

      IF ( (tMax-tMin).GT.monSolutionMaxRange ) THEN
        _BEGIN_MASTER(myThid)
        WRITE(msgBuf,'(A,1P2E11.3)')
     &    'SOLUTION IS HEADING OUT OF BOUNDS: tMin,tMax=',tMin,tMax
        CALL PRINT_MESSAGE(msgBuf,errorMessageUnit,SQUEEZE_RIGHT,myThid)
        WRITE(msgBuf,'(2A,1PE11.3,A)') '  exceeds allowed range ',
     &             '(monSolutionMaxRange=', monSolutionMaxRange,')'
        CALL PRINT_MESSAGE(msgBuf,errorMessageUnit,SQUEEZE_RIGHT,myThid)
        WRITE(msgBuf,'(A,I10)')
     &    'MON_SOLUTION: STOPPING CALCULATION at Iter=', myIter
        CALL PRINT_MESSAGE(msgBuf,errorMessageUnit,SQUEEZE_RIGHT,myThid)
        _END_MASTER(myThid)

          WRITE(suff,'(I10.10)') myIter
          CALL
     &   WRITE_FLD_XYZ_RL('stateThetacrash.',suff,theta,myIter,myThid)
          CALL
     &   WRITE_FLD_XYZ_RL('stateSaltcrash.',suff,salt,myIter,myThid)
          CALL
     &   WRITE_FLD_XY_RL('stateEtacrash.',suff,etaN,myIter,myThid)
          CALL
     &   WRITE_FLD_XYZ_RL('stateUvelcrash.',suff,uVel,myIter,myThid)
          CALL
     &   WRITE_FLD_XYZ_RL('stateVvelcrash.',suff,vVel,myIter,myThid)
          CALL
     &   WRITE_FLD_XYZ_RL('stateWvelcrash.',suff,wVel,myIter,myThid)

        CALL ALL_PROC_DIE( myThid )
        STOP
     &  'ABNORMAL END: S/R MON_SOLUTION, stops due to EXTREME Pot.Temp'
      ENDIF

*** end snippet ***


----------------------------------------------------------------------

Message: 1
Date: Tue, 3 Mar 2015 15:16:34 +0100
From: Martin Losch <Martin.Losch at awi.de>
To: MITgcm Support <mitgcm-support at mitgcm.org>
Subject: Re: [MITgcm-support] how many timesteps?
Message-ID: <DE6306E6-B9A7-4965-B34A-D56550781479 at awi.de>
Content-Type: text/plain; charset="utf-8"

Hi there,

the monitor package is indeed quite useful, but it will only give you the statistics of a snapshot every ?monitorFreq? seconds. So if you time step is deltaT=100 and your monitorFreq = 1000, then you?ll get output every 10 time steps. If the model dies in-between, there will be no extra output, because the monitor package will be called only every 10 time steps. Setting monitorFreq = 100 or any value < deltaT will give you output every timestep, but will make the model very slow, so it is useful only if you have to run a relatively small amount of time steps before the model crashes.

I remember that Paul Holland once implemented a hack to have the model output some stuff before it dies. Paul, maybe you can share this code.

Jonny, in any case you?ll have to rerun your model with smaller monitorFreq. The debugLevel doesn?t have anything to do with this.

Martin

> On 03 Mar 2015, at 13:42, Jonny Williams <Jonny.Williams at bristol.ac.uk> wrote:
>
> Thanks very much for that Ed
>
> Unfortunately that output isn't present in my STDOUT* files. I think this is because I have debugMode=.FALSE., in my eedata file and debugLevel=-1, in my data file?
>
> Cheers
>
> Jonny
>
>
> On 3 March 2015 at 12:22, Edward Doddridge <edward.doddridge at magd.ox.ac.uk> wrote:
> Hi Jonny,
>
> The STDERR.00* files should show you how many iterations the model ran for before it died. Here's an example from a run that stopped after 20 timesteps.
>
> (PID.TID 0000.0001) SOLUTION IS HEADING OUT OF BOUNDS: tMin,tMax= -1.255E+03  1.589E+03
> (PID.TID 0000.0001)   exceeds allowed range (monSolutionMaxRange=  1.000E+03)
> (PID.TID 0000.0001) MON_SOLUTION: STOPPING CALCULATION at Iter=        20
> (PID.TID 0000.0001) *** ERROR *** S/R ALL_PROC_DIE: ending the run
>
> If you want to monitor how a run is going before it dies, the monitor package is the easiest way I know of.  "monitorFreq" in PARAM03 in the data file sets the frequency of the output. It's a very cheap way to dump data regularly. If you know deltaT and the monitorFreq, then you can work out the number of iterations the model has done.
>
> Best,
> Ed
>
> ________________________________
> Edward Doddridge
> Doctoral Student
> Atmospheric, Oceanic and Planetary Physics University of Oxford
>
> www.doddridge.me


________________________________
 This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system.
________________________________



More information about the MITgcm-support mailing list