<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">
<p style="margin-top:0;margin-bottom:0">Thanks everyone for this info. I set up my own quick 2 + 2 = 4 test for my specific configuration, and it failed (i.e. the final pickups were different), even if I added the compiler flag -O0 to switch off optimisation.
Here are my other compiler flags:</p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0"><span>FFLAGS='-h byteswapio -assume byterecl -convert big_endian -fpe0 -g -traceback'</span><span>FFLAGS='-h byteswapio -assume byterecl -convert big_endian -fpe0 -g -traceback'</span><br>
</p>
<p style="margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style="margin-top:0;margin-bottom:0"><span>As I mentioned before, my configuration is bit-reproducible in non-restart circumstances (i.e. 4 = 4), even without -O0.</span></p>
<p style="margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style="margin-top:0;margin-bottom:0"><span>Regarding the blowup, it looks like what is happening is that a single cell in midwinter will suddenly have very low sea ice concentration (eg, 0.24 surrounded by approx. 0.9 everywhere else). The location varies
depending on my parameter choices (sometimes near the ice shelf front, sometimes near the boundary) i.e. it's not the same cell every time. Being exposed to such a cold atmosphere leads to very strong buoyancy forcing (virtual salt flux of 30 m/y), and temperature
and salinity deeper in the water column (500-1000 m) get crazy values. </span></p>
<p style="margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style="margin-top:0;margin-bottom:0"><span>I've tried playing with the sea ice parameters (ice-atmosphere drag, ice-ocean drag, sea ice diffusivity, sea ice strength) but this didn't help, in fact it always blew up sooner. Now I'm trying a reduced timestep
or increased vertical diffusivity. Next on the list is viscosity. Thanks so much Martin for your suggestions here.</span></p>
<p style="margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style="margin-top:0;margin-bottom:0"><span>All the best,</span></p>
<p style="margin-top:0;margin-bottom:0"><span>Kaitlin</span></p>
<p style="margin-top:0;margin-bottom:0"><span><br>
</span></p>
<p style="margin-top:0;margin-bottom:0"><span><br>
</span></p>
</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> MITgcm-support <mitgcm-support-bounces@mitgcm.org> on behalf of Jean-Michel Campin <jmc@mit.edu><br>
<b>Sent:</b> 11 June 2018 19:03:51<br>
<b>To:</b> mitgcm-support@mitgcm.org<br>
<b>Subject:</b> Re: [MITgcm-support] Reproducibility of blowup</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">Hi,<br>
<br>
As Martin wrote, restart are tested on daily basis, and they generally work (i.e., "pass" the test)<br>
even with compiler optimisations turned on. For instance, on "engaging" cluster, <br>
with gfortran compiler: <br>
<a href="http://mitgcm.org/testing/results/2018_06/rs_engaging1-mpi-fast_20180611_0/summary.txt">http://mitgcm.org/testing/results/2018_06/rs_engaging1-mpi-fast_20180611_0/summary.txt</a><br>
and with intel compiler:<br>
<a href="http://mitgcm.org/testing/results/2018_06/rs_engaging1-ifort-fast_20180611_0/summary.txt">http://mitgcm.org/testing/results/2018_06/rs_engaging1-ifort-fast_20180611_0/summary.txt</a><br>
The few experiments that don't pass the test are known to have restart issue (e.g., with pkg/fizhi).<br>
<br>
But occasionally, with some specific compiler options, a restart test might be broken,<br>
like here: <a href="http://mitgcm.org/testing/results/2018_05/rs_svante-pgiMPI_20180518_0/summary.txt">
http://mitgcm.org/testing/results/2018_05/rs_svante-pgiMPI_20180518_0/summary.txt</a><br>
<br>
Cheers,<br>
Jean-Michel<br>
<br>
On Mon, Jun 11, 2018 at 04:51:10PM +0000, Menemenlis, Dimitris (329C) wrote:<br>
> Just to be sure. You are running with all compiler optimization flags off, e.g., using -ieee option, when you generate the make file. Bit reproducibility is not guaranteed when compiler optimization flags are turned on.<br>
> <br>
> > On Jun 11, 2018, at 9:04 AM, Martin Losch <Martin.Losch@awi.de> wrote:<br>
> > <br>
> > Hi Kaitlin,<br>
> > <br>
> > the restarts are continuously tested (2 + 2 timesteps = 4 timesteps?) and you can be pretty sure that the restarts work.
<br>
> > Having said that, I think that there is a difference between storing and reading 64bit data (real*8 in fortran) and having this data in your online memory. I have the impression that there is internally more precision available, that is rounded when you
write a pickup and this leads to the behavior that your described. I have experienced the same thing with a vector computer, and I have not gotten to the bottom of it. Your model is probably marginally stable and small changes can modify a blowup situtation.
You probably need to increase viscosity a little or reduce your time step.<br>
> > <br>
> > Martin<br>
> > <br>
> > <br>
> > <br>
> >> On 11. Jun 2018, at 14:51, Naughten, Kaitlin A. <kaight@bas.ac.uk> wrote:<br>
> >> <br>
> >> Hello,<br>
> >> <br>
> >> I am trying to diagnose a blowup in my simulation by first reducing the checkpoint frequency (so I can get as close as possible to the blowup on successive restarts) and then setting the monitor frequency to monitor every timestep, so I can catch the exact
timestep when the model blows up and output some fields to figure out what's going on.<br>
> >> <br>
> >> Reducing the checkpoint frequency from monthly to daily worked just fine, giving me the pickup file within 1 day of the blowup. Now I've reduced the checkpoint frequency to hourly, meaning I would expect the model to blow up at some point before 24 checkpoints
are written. However, the model has now written 335 hourly checkpoints and counting (almost 14 days), meaning it's soared past the point where it originally blew up in the daily-checkpoint simulation!<br>
> >> <br>
> >> I've already checked that the model is bit-reproducible at least for the first 10 minutes of a simulation, and that it remains bit-reproducible when I change the checkpoint frequency. So there's nothing wrong with my compilers in that respect. I'm guessing
the key difference here is that I'm restarting from a checkpoint. Should I expect that process to be bit-reproducible? In other words, is it suspicious that my blowup disappears when I stop and restart the model?<br>
> >> <br>
> >> Many thanks,<br>
> >> Kaitlin<br>
> >> <br>
> >> This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material
supplied to NERC may be stored in an electronic records management system.<br>
> >> _______________________________________________<br>
> >> MITgcm-support mailing list<br>
> >> MITgcm-support@mitgcm.org<br>
> >> <a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a><br>
> > <br>
> > _______________________________________________<br>
> > MITgcm-support mailing list<br>
> > MITgcm-support@mitgcm.org<br>
> > <a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a><br>
> _______________________________________________<br>
> MITgcm-support mailing list<br>
> MITgcm-support@mitgcm.org<br>
> <a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a><br>
_______________________________________________<br>
MITgcm-support mailing list<br>
MITgcm-support@mitgcm.org<br>
<a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a><br>
</div>
</span></font></div>
<hr>
<small>This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material
supplied to NERC may be stored in an electronic records management system</small>.
<hr>
</body>
</html>