[MITgcm-support] pickup question

Fri Apr 23 10:24:01 EDT 2021

Hi,

The MITgcm code is supposed to restart exactly, so that, in your case, the second run that
started from a pickup should also blow up at the same time-step.

The reason I can write this is that it's tested (every night) for many different set-up
and few different compilers.
The latest results are there: https://mitgcm.org/testing-summary/
with the result of "restart" (third column) test is given as the fraction of experiments 
that pass the test (i.e., zero difference in the pickup files) over the full number of 
experiments that run the test.
(e.g., 92:93, on villon with gfortran + MPI and multi-threaded: 
 http://mitgcm.org/testing/results/2021_04/rs_villon-a_20210423_0/summary.txt )

These tests generally work well without compiler optimisation (-O0), or with optimisation
but using a careful selection of compiler flags that make it works (this is the
case for the standard intel compiler optfile: tools/build_options/linux_amd64_ifort ) 
but not necessary so well with any compiler and any compiler optimisation
(e.g., with PGI comiler + MPI and -O2, optfile: tools/build_options/linux_amd64_pgf77
 http://mitgcm.org/testing/results/2021_04/rs_svante-pgiMPI_20210422_0/summary.txt
we have several experiments that fail the test but did pass when no compiler optimisation
was used).

And regarding restart issues with muti-procs run (using MPI, Matt's point), it should
still restart exactly, but:
a) if the second time you pick different procs (from a cluster) it's more difficult
   to make sure that it will still pass the restart test.
b) there some pieces of code we don't use in all these tested set-up (e.g.
   avoid GLOBAL_SUM and use instead GLOBAL_SUM_TILE ) for this reason.
   And if you are using some less tested pieces of code or with some combination
   of options, the set-up might not pass the restart test even without compiler
   optimisation.
c) and if the number of processor you use change, it will pretty much always fail 
  the restart test (even if the tiling of the domain does not change).

Cheers,
Jean-Michel

On Thu, Apr 22, 2021 at 09:29:27AM -0700, Matthew Mazloff wrote:
> If you are running multicore the default MITgcm setup isn???t exactly reproducible due to round-off differences amplified by the intrinsic ocean chaos. So it will blow up at slightly different time-steps. 
> 
> The restart should be the exact same at the start of the run though and for at least a few timesteps. 
> 
> Matt
> 
> 
> > On Apr 22, 2021, at 7:00 AM, ????????? <lanchiyu12 at gmail.com> wrote:
> > 
> > Dear all,
> > 
> > I am running a regional ocean circulation model. Something strange happened. I use the pickup to restart the model after it blows up. However, the restart is not exactly the same as the previous one. I use the last pickup before the blow-up time, then restarted model can run even longer than the previous blow-up time.
> > 
> > By the way, I use ???pickupStrictlyMatch=.TRUE.??? in the data file. I mainly use the KPP, the EXF (with surface SSS and SST relaxation), and OBCS packages.
> > 
> > Any ideas will be grateful.
> > 
> > Lambert
> > _______________________________________________
> > MITgcm-support mailing list
> > MITgcm-support at mitgcm.org
> > https://urldefense.com/v3/__http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support__;!!Mih3wA!TtnbfvDUJ17oFrdLe6p3qPJjafPypLrULpTymiJfcwzwpa-6kFpBqCF1F9UgM3BHqw$ 
> 

> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support