[MITgcm-support] Baroclinic instability with MPI run
Jean-Michel Campin
jmc at ocean.mit.edu
Mon Jan 26 08:45:37 EST 2015
Hi Noriyuki,
There are some encouraging things in what you are reporting.
But I wonder if your set-up with flat bottom is not too regular
(grid + initial condition + forcing) preventing baroclinic
instability to develop. This could happen, and generally adding
a small amount of noise in initial conditions (e.g., in initial temperature)
is enough to break the perfect symetry and to allow instability to grow.
Cheers,
Jean-Michel
On Mon, Jan 26, 2015 at 08:14:21PM +0900, Noriyuki Yamamoto wrote:
> Hi Jean-Michel,
>
> Sorry for my late reply.
> I have tried your suggestions and begun to think that MPI may not be
> the main cause of the problem.
> I've got very confusing result.
>
> Firstly, a verification experiment for 'exp 4' passed the test
> (using the testreport script) with the same compiler and optfile.
> With MPI, because our system needs queueing a job to run with mpi,
> I manually compared dynamic field statistics in STDOUT.0000 to those
> in results/output.txt.
> They showed good agreements.
>
> Secondly, I ran the model with lower compiler optimisation (-O1, -O0).
> In flat-bottomed case with mpi, while with -O2 optimisation baroclinic
> instability has the zonal wavenumber of nPx (or number of tiles
> in east-west direction) and doesn't cascade up or down,
> with -O1 and -O0 optimisation baroclinic instability doesn't occur
> in the first place (at least until 400 days).
> With -O2 option, the instability develops after 30 days integration.
> The model is forced by westerly wind and temperature restoring along
> northern and southern walls. The wind and temperature are both
> zonally constant.
> The initial state is u, v, w = 0 and T = T_south (restoring
> temperature along southern wall which is the coldest) everywhere.
>
> I also tried lowering compiler optimisation with no-mpi run.
> In flat-bottomed case, while -O2 run shows baloclinic instability
> which cascades up by 30 days (here different from mpi run as I
> mentioned in previous mail), -O1 and -O0 runs don't show any
> intability waves.
> On the other hand, in wavy (zonally sinusoidal wave) topography case,
> baloclinic instability cascades up in the same way regardless of
> optimisation levels.
>
> Tests of verification experiment for exp4 with the three levels of
> optimisation also passed.
> So some of my settings which aren't used in exp4 or their combination
> may cause this problem?
>
> If that helps, I also tried changing (nSx, nPx) and setting
> GLOBAL_SUM_SEND_RECV in CPP_EEOPTIONS.h but with -O2 optimisation.
> Runs with (nSx, nPx) = (1, 16) and (2, 8) shows the same patterns of
> surface temperature evolution whose zonal wavenumber is constantly 16.
>
> What should I do next may be a debug run?
>
> Any suggestions will be greatly appreciated,
> Noriyuki.
>
> On 2015/01/23 0:28, Noriyuki Yamamoto wrote:
> >Hi Jean-Michel,
> >
> >Thank you for your quick reply and suggestions!
> >I'm compiling with an optfile I modified from linux_amd64_ifort+mpi_sal_oxford.
> >I attach my optfile here.
> >
> >
> >
> >I'll try your suggestions tomorrow.
> >Does "#define GLOBAL_SUM_SEND_RECV" create the global output files or just check MPI operations?
> >
> >Thanks,
> >Noriyuki
> >
> >2015/01/22 23:09、Jean-Michel Campin <jmc at ocean.mit.edu> のメール:
> >
> >>Hi Noriyuki,
> >>
> >>which optfile are you compiling with ?
> >>
> >>Otherwise, few other things here:
> >>
> >>1) Although I asked Chris for full report about the set-up in order to reproduce it,
> >>(easy since I have access to the same computer), to my knowledge,
> >>the "Independ Tiling" problem has never been "reproducable".
> >>
> >>2) One potential problem could be compiler optimisation.
> >>To clarify this point, you could:
> >>a) with same compiler and MPI and optfile, try to run few simple
> >>verification experiments (e.g., exp4) and compare the output
> >>with the reference output (e.g., exp4/results/output.txt).
> >>There is a script (verification/testreport) that does that for all
> >>or a sub-set of experiment and might not be too difficult to use
> >>(testreport -h for a list of option).
> >>b) you could try to lower the level of compiler optimisation.
> >>default is "-O2" (from linux_amd64_ifort11 optfile); you could try with
> >>"-O1" (it will run slower) and "-O0" (even slower).
> >>If "-O0" fixes the problem, then we should try to find which
> >>routine cause the problem and just compile this one with "-O0"
> >>(since "-O0" for all src code is far too slow).
> >>
> >>3) An other source ot problem could be the code itself. This is not
> >>very likely with most standard options and pkgs (since they are
> >>tested on a regular basis) but can definitively happen.
> >>a) you can check if it's due to a tiling problem or MPI problem,
> >>simply by running with same sNx but decreasing nPx while
> >>increasing nSx (to maintain the same number of tiles = nSx*nPx).
> >>If you compile with "#define GLOBAL_SUM_SEND_RECV" in CPP_EEOPTIONS.h,
> >>(slower, but make the "global-sum" results independent of processors
> >> number but still dependent on domain tiling) and run the 2 cases
> >>(with different nPx), you could expect to get the same results.
> >>b) if all previous suggestions do not help, you could provide a
> >>copy of you set-up (checkpoint64u is fairly recent) so that we will
> >>try to reproduce it. Could start with your customized code dir
> >>and set of parameters files (data*).
> >>
> >>Cheers,
> >>Jean-Michel
> >>
> >>On Thu, Jan 22, 2015 at 09:02:18PM +0900, Noriyuki Yamamoto wrote:
> >>>Hi all,
> >>>
> >>>I'm running into a problem with MPI run.
> >>>Outputs from mpi and no-mpi run differ qualitatively.
> >>>
> >>>This seems to be the same problem reported in "Independ Tiling"
> >>>thread (http://forge.csail.mit.edu/pipermail/mitgcm-support/2014-March/009017.html).
> >>>Is there any progress about it?
> >>>If not, I hope this information will add some clues to fixing up.
> >>>
> >>>The model has a zonally periodic channel and is forced by westerly
> >>>wind and temperature restoring along the northen and southern wall
> >>>in mid-high latitude.
> >>>I tested some cases with different topography.
> >>>In flat-bottomed case with no-mpi, baroclinic instability develops
> >>>and cascades up to larger scales.
> >>>But with mpi, the zonal wavenumber of baroclinic instability is
> >>>fixed to nPx during the 3000-day integration (I tested nPx = 16, 20)
> >>>and doesn't cascade up.
> >>>In zonally wavy (sinusoidal wave whose k is not nPx) topography case
> >>>with mpi using the same mpi executable file (compiled by genmake
> >>>-mpi) as flat-bottomed case,
> >>>baroclinic instablity cascades up and the result seems to be similar
> >>>to that of no-mpi run though I checked only early surface
> >>>temperature distribution.
> >>>
> >>>In MPI run I tried two patterns of (nPx, nPy) = (16, 4), (20,4).
> >>>I compiled MITgcm codes by Intel compiler 13.1.3 with Cray MPI
> >>>library 6.3.0 on SUSE Linux Enterprise Server 11 (x86_64).
> >>>MITgcm version is checkpoint64u (sorry that it's no latest version).
> >>>If necessary, I will attach data and SIZE.h files later.
> >>>
> >>>Sorry for my poor English.
> >>>Noriyuki.
> >>>
> >>>--
> >>>Noriyuki Yamamoto
> >>>PhD Student - Physical Oceanography Group
> >>>Division of Earth and Planetary Sciences,
> >>>Graduate School of Science, Kyoto University.
> >>>Mail:nymmto at kugi.kyoto-u.ac.jp
> >>>Tel:+81-75-753-3924
> >>>
> >>>
> >>>_______________________________________________
> >>>MITgcm-support mailing list
> >>>MITgcm-support at mitgcm.org
> >>>http://mitgcm.org/mailman/listinfo/mitgcm-support
> >>_______________________________________________
> >>MITgcm-support mailing list
> >>MITgcm-support at mitgcm.org
> >>http://mitgcm.org/mailman/listinfo/mitgcm-support
> >>
> >
> >
> >_______________________________________________
> >MITgcm-support mailing list
> >MITgcm-support at mitgcm.org
> >http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>
> --
> 山本紀幸
>
> 京都大学 大学院 理学研究科 地球惑星科学専攻
> 地球物理学教室 海洋物理学研究室 博士課程2回生
> Mail:nymmto at kugi.kyoto-u.ac.jp
> Tel:075-753-3924
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
More information about the MITgcm-support
mailing list