[MITgcm-support] Baroclinic instability with MPI run

Noriyuki Yamamoto nymmto at kugi.kyoto-u.ac.jp
Fri Jan 30 01:39:18 EST 2015


Hi Jean-Michel,

I've finally got the same result between mpi and no-mpi runs with flat 
bottom
regardless of optimisation levels.
As you said, setting initial temperature as a little bit noisy 
(north-south linear
distribution + white noise) brings the same result.
Small numerical errors with -O2 optimisation probably made baloclinic 
instability
possible to develop in flat-bottomed case with the previous initial 
condition.

Thank you very much for your fruitful suggestions,
Noriyuki

On 2015/01/26 22:45, Jean-Michel Campin wrote:
> Hi Noriyuki,
>
> There are some encouraging things in what you are reporting.
> But I wonder if your set-up with flat bottom is not too regular
> (grid + initial condition + forcing) preventing baroclinic
> instability to develop. This could happen, and generally adding
> a small amount of noise in initial conditions (e.g., in initial temperature)
> is enough to break the perfect symetry and to allow instability to grow.
>
> Cheers,
> Jean-Michel
>
> On Mon, Jan 26, 2015 at 08:14:21PM +0900, Noriyuki Yamamoto wrote:
>> Hi Jean-Michel,
>>
>> Sorry for my late reply.
>> I have tried your suggestions and begun to think that MPI may not be
>> the main cause of the problem.
>> I've got very confusing result.
>>
>> Firstly, a verification experiment for 'exp 4' passed the test
>> (using the testreport script) with the same compiler and optfile.
>> With MPI, because our system needs queueing a job to run with mpi,
>> I manually compared dynamic field statistics in STDOUT.0000 to those
>> in results/output.txt.
>> They showed good agreements.
>>
>> Secondly, I ran the model with lower compiler optimisation (-O1, -O0).
>> In flat-bottomed case with mpi, while with -O2 optimisation baroclinic
>> instability has the zonal wavenumber of nPx (or number of tiles
>> in east-west direction) and doesn't cascade up or down,
>> with -O1 and -O0 optimisation baroclinic instability doesn't occur
>> in the first place (at least until 400 days).
>> With -O2 option, the instability develops after 30 days integration.
>> The model is forced by westerly wind and temperature restoring along
>> northern and southern walls. The wind and temperature are both
>> zonally constant.
>> The initial state is u, v, w = 0 and T = T_south (restoring
>> temperature along southern wall which is the coldest) everywhere.
>>
>> I also tried lowering compiler optimisation with no-mpi run.
>> In flat-bottomed case, while -O2 run shows baloclinic instability
>> which cascades up by 30 days (here different from mpi run as I
>> mentioned in previous mail),  -O1 and -O0 runs don't show any
>> intability waves.
>> On the other hand, in wavy (zonally sinusoidal wave) topography case,
>> baloclinic instability cascades up in the same way regardless of
>> optimisation levels.
>>
>> Tests of verification experiment for exp4 with the three levels of
>> optimisation also passed.
>> So some of my settings which aren't used in exp4 or their combination
>> may cause this problem?
>>
>> If that helps, I also tried changing (nSx, nPx) and setting
>> GLOBAL_SUM_SEND_RECV in CPP_EEOPTIONS.h but with -O2 optimisation.
>> Runs with (nSx, nPx) = (1, 16) and (2, 8) shows the same patterns of
>> surface temperature evolution whose zonal wavenumber is constantly 16.
>>
>> What should I do next may be a debug run?
>>
>> Any suggestions will be greatly appreciated,
>> Noriyuki.
>>
>> On 2015/01/23 0:28, Noriyuki Yamamoto wrote:
>>> Hi Jean-Michel,
>>>
>>> Thank you for your quick reply and suggestions!
>>> I'm compiling with an optfile I modified from linux_amd64_ifort+mpi_sal_oxford.
>>> I attach my optfile here.
>>>
>>>
>>>
>>> I'll try your suggestions tomorrow.
>>> Does "#define GLOBAL_SUM_SEND_RECV" create the global output files or just check MPI operations?
>>>
>>> Thanks,
>>> Noriyuki
>>>
>>> 2015/01/22 23:09、Jean-Michel Campin <jmc at ocean.mit.edu> のメール:
>>>
>>>> Hi Noriyuki,
>>>>
>>>> which optfile are you compiling with ?
>>>>
>>>> Otherwise, few other things here:
>>>>
>>>> 1) Although I asked Chris for full report about the set-up in order to reproduce it,
>>>> (easy since I have access to the same computer), to my knowledge,
>>>> the "Independ Tiling" problem has never been "reproducable".
>>>>
>>>> 2) One potential problem could be compiler optimisation.
>>>> To clarify this point, you could:
>>>> a) with same compiler and MPI and optfile, try to run few simple
>>>> verification experiments (e.g., exp4) and compare the output
>>>> with the reference output (e.g., exp4/results/output.txt).
>>>> There is a script (verification/testreport) that does that for all
>>>> or a sub-set of experiment and might not be too difficult to use
>>>> (testreport -h for a list of option).
>>>> b) you could try to lower the level of compiler optimisation.
>>>> default is "-O2" (from linux_amd64_ifort11 optfile); you could try with
>>>> "-O1" (it will run slower) and "-O0" (even slower).
>>>> If "-O0" fixes the problem, then we should try to find which
>>>> routine cause the problem and just compile this one with "-O0"
>>>> (since "-O0" for all src code is far too slow).
>>>>
>>>> 3) An other source ot problem could be the code itself. This is not
>>>> very likely with most standard options and pkgs (since they are
>>>> tested on a regular basis) but can definitively happen.
>>>> a) you can check if it's due to a tiling problem or MPI problem,
>>>> simply by running with same sNx but decreasing nPx while
>>>> increasing nSx (to maintain the same number of tiles = nSx*nPx).
>>>> If you compile with "#define GLOBAL_SUM_SEND_RECV" in CPP_EEOPTIONS.h,
>>>> (slower, but make the "global-sum" results independent of processors
>>>>   number but still dependent on domain tiling) and run the 2 cases
>>>> (with different nPx), you could expect to get the same results.
>>>> b) if all previous suggestions do not help, you could provide a
>>>> copy of you set-up (checkpoint64u is fairly recent) so that we will
>>>> try to reproduce it. Could start with your customized code dir
>>>> and set of parameters files (data*).
>>>>
>>>> Cheers,
>>>> Jean-Michel
>>>>
>>>> On Thu, Jan 22, 2015 at 09:02:18PM +0900, Noriyuki Yamamoto wrote:
>>>>> Hi all,
>>>>>
>>>>> I'm running into a problem with MPI run.
>>>>> Outputs from mpi and no-mpi run differ qualitatively.
>>>>>
>>>>> This seems to be the same problem reported in "Independ Tiling"
>>>>> thread (http://forge.csail.mit.edu/pipermail/mitgcm-support/2014-March/009017.html).
>>>>> Is there any progress about it?
>>>>> If not, I hope this information will add some clues to fixing up.
>>>>>
>>>>> The model has a zonally periodic channel and is forced by westerly
>>>>> wind and temperature restoring along the northen and southern wall
>>>>> in mid-high latitude.
>>>>> I tested some cases with different topography.
>>>>> In flat-bottomed case with no-mpi, baroclinic instability develops
>>>>> and cascades up to larger scales.
>>>>> But with mpi, the zonal wavenumber of baroclinic instability is
>>>>> fixed to nPx during the 3000-day integration (I tested nPx = 16, 20)
>>>>> and doesn't cascade up.
>>>>> In zonally wavy (sinusoidal wave whose k is not nPx) topography case
>>>>> with mpi using the same mpi executable file (compiled by genmake
>>>>> -mpi) as flat-bottomed case,
>>>>> baroclinic instablity cascades up and the result seems to be similar
>>>>> to that of no-mpi run though I checked only early surface
>>>>> temperature distribution.
>>>>>
>>>>> In MPI run I tried two patterns of (nPx, nPy) = (16, 4), (20,4).
>>>>> I compiled MITgcm codes by Intel compiler 13.1.3 with Cray MPI
>>>>> library 6.3.0 on SUSE Linux Enterprise Server 11 (x86_64).
>>>>> MITgcm version is checkpoint64u (sorry that it's no latest version).
>>>>> If necessary, I will attach data and SIZE.h files later.
>>>>>
>>>>> Sorry for my poor English.
>>>>> Noriyuki.
>>>>>
>>>>> -- 
>>>>> Noriyuki Yamamoto
>>>>> PhD Student - Physical Oceanography Group
>>>>> Division of Earth and Planetary Sciences,
>>>>> Graduate School of Science, Kyoto University.
>>>>> Mail:nymmto at kugi.kyoto-u.ac.jp
>>>>> Tel:+81-75-753-3924
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> MITgcm-support mailing list
>>>>> MITgcm-support at mitgcm.org
>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>>> _______________________________________________
>>>> MITgcm-support mailing list
>>>> MITgcm-support at mitgcm.org
>>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>>>
>>>
>>> _______________________________________________
>>> MITgcm-support mailing list
>>> MITgcm-support at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>
>> -- 
>> 山本紀幸
>>
>> 京都大学 大学院 理学研究科 地球惑星科学専攻
>> 地球物理学教室 海洋物理学研究室 博士課程2回生
>> Mail:nymmto at kugi.kyoto-u.ac.jp
>> Tel:075-753-3924
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>


-- 
山本紀幸

京都大学 大学院 理学研究科 地球惑星科学専攻
地球物理学教室 海洋物理学研究室 博士課程2回生
Mail:nymmto at kugi.kyoto-u.ac.jp
Tel:075-753-3924




More information about the MITgcm-support mailing list