[MITgcm-support] Reproducibility of blowup

Jean-Michel Campin jmc at mit.edu
Mon Jun 11 14:03:51 EDT 2018


Hi,

As Martin wrote, restart are tested on daily basis, and they generally work (i.e., "pass" the test)
even with compiler optimisations turned on. For instance, on "engaging" cluster, 
with gfortran compiler: 
 http://mitgcm.org/testing/results/2018_06/rs_engaging1-mpi-fast_20180611_0/summary.txt
and with intel compiler:
 http://mitgcm.org/testing/results/2018_06/rs_engaging1-ifort-fast_20180611_0/summary.txt
The few experiments that don't pass the test are known to have restart issue (e.g., with pkg/fizhi).

But occasionally, with some specific compiler options, a restart test might be broken,
like here: http://mitgcm.org/testing/results/2018_05/rs_svante-pgiMPI_20180518_0/summary.txt

Cheers,
Jean-Michel

On Mon, Jun 11, 2018 at 04:51:10PM +0000, Menemenlis, Dimitris (329C) wrote:
> Just to be sure. You are running with all compiler optimization flags off, e.g., using -ieee option, when you generate the make file.  Bit reproducibility is not guaranteed when compiler optimization flags are turned on.
> 
> > On Jun 11, 2018, at 9:04 AM, Martin Losch <Martin.Losch at awi.de> wrote:
> > 
> > Hi Kaitlin,
> > 
> > the restarts are continuously tested (2 + 2 timesteps = 4 timesteps?) and you can be pretty sure that the restarts work. 
> > Having said that, I think that there is a difference between storing and reading 64bit data (real*8 in fortran) and having this data in your online memory. I have the impression that there is internally more precision available, that is rounded when you write a pickup and this leads to the behavior that your described. I have experienced the same thing with a vector computer, and I have not gotten to the bottom of it. Your model is probably marginally stable and small changes can modify a blowup situtation. You probably need to increase viscosity a little or reduce your time step.
> > 
> > Martin
> > 
> > 
> > 
> >> On 11. Jun 2018, at 14:51, Naughten, Kaitlin A. <kaight at bas.ac.uk> wrote:
> >> 
> >> Hello,
> >> 
> >> I am trying to diagnose a blowup in my simulation by first reducing the checkpoint frequency (so I can get as close as possible to the blowup on successive restarts) and then setting the monitor frequency to monitor every timestep, so I can catch the exact timestep when the model blows up and output some fields to figure out what's going on.
> >> 
> >> Reducing the checkpoint frequency from monthly to daily worked just fine, giving me the pickup file within 1 day of the blowup. Now I've reduced the checkpoint frequency to hourly, meaning I would expect the model to blow up at some point before 24 checkpoints are written. However, the model has now written 335 hourly checkpoints and counting (almost 14 days), meaning it's soared past the point where it originally blew up in the daily-checkpoint simulation!
> >> 
> >> I've already checked that the model is bit-reproducible at least for the first 10 minutes of a simulation, and that it remains bit-reproducible when I change the checkpoint frequency. So there's nothing wrong with my compilers in that respect. I'm guessing the key difference here is that I'm restarting from a checkpoint. Should I expect that process to be bit-reproducible? In other words, is it suspicious that my blowup disappears when I stop and restart the model?
> >> 
> >> Many thanks,
> >> Kaitlin
> >> 
> >> This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system.
> >> _______________________________________________
> >> MITgcm-support mailing list
> >> MITgcm-support at mitgcm.org
> >> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> > 
> > _______________________________________________
> > MITgcm-support mailing list
> > MITgcm-support at mitgcm.org
> > http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support


More information about the MITgcm-support mailing list