[MITgcm-support] Reproducibility of blowup

Martin Losch Martin.Losch at awi.de
Mon Jun 11 12:04:38 EDT 2018


Hi Kaitlin,

the restarts are continuously tested (2 + 2 timesteps = 4 timesteps?) and you can be pretty sure that the restarts work. 
Having said that, I think that there is a difference between storing and reading 64bit data (real*8 in fortran) and having this data in your online memory. I have the impression that there is internally more precision available, that is rounded when you write a pickup and this leads to the behavior that your described. I have experienced the same thing with a vector computer, and I have not gotten to the bottom of it. Your model is probably marginally stable and small changes can modify a blowup situtation. You probably need to increase viscosity a little or reduce your time step.

Martin



> On 11. Jun 2018, at 14:51, Naughten, Kaitlin A. <kaight at bas.ac.uk> wrote:
> 
> Hello,
> 
> I am trying to diagnose a blowup in my simulation by first reducing the checkpoint frequency (so I can get as close as possible to the blowup on successive restarts) and then setting the monitor frequency to monitor every timestep, so I can catch the exact timestep when the model blows up and output some fields to figure out what's going on.
> 
> Reducing the checkpoint frequency from monthly to daily worked just fine, giving me the pickup file within 1 day of the blowup. Now I've reduced the checkpoint frequency to hourly, meaning I would expect the model to blow up at some point before 24 checkpoints are written. However, the model has now written 335 hourly checkpoints and counting (almost 14 days), meaning it's soared past the point where it originally blew up in the daily-checkpoint simulation!
> 
> I've already checked that the model is bit-reproducible at least for the first 10 minutes of a simulation, and that it remains bit-reproducible when I change the checkpoint frequency. So there's nothing wrong with my compilers in that respect. I'm guessing the key difference here is that I'm restarting from a checkpoint. Should I expect that process to be bit-reproducible? In other words, is it suspicious that my blowup disappears when I stop and restart the model?
> 
> Many thanks,
> Kaitlin
> 
> This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system.
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support



More information about the MITgcm-support mailing list