[MITgcm-support] Baroclinic instability with MPI run

Noriyuki Yamamoto nymmto at kugi.kyoto-u.ac.jp
Thu Jan 22 10:28:14 EST 2015


Hi Jean-Michel,

Thank you for your quick reply and suggestions!
I'm compiling with an optfile I modified from linux_amd64_ifort+mpi_sal_oxford.
I attach my optfile here. 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: my_build_option
Type: application/octet-stream
Size: 2472 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20150123/ba55e48d/attachment.obj>
-------------- next part --------------


I'll try your suggestions tomorrow.
Does "#define GLOBAL_SUM_SEND_RECV" create the global output files or just check MPI operations?

Thanks,
Noriyuki

2015/01/22 23:09?Jean-Michel Campin <jmc at ocean.mit.edu> ?????

> Hi Noriyuki,
> 
> which optfile are you compiling with ?
> 
> Otherwise, few other things here:
> 
> 1) Although I asked Chris for full report about the set-up in order to reproduce it,
> (easy since I have access to the same computer), to my knowledge, 
> the "Independ Tiling" problem has never been "reproducable".
> 
> 2) One potential problem could be compiler optimisation. 
> To clarify this point, you could:
> a) with same compiler and MPI and optfile, try to run few simple 
> verification experiments (e.g., exp4) and compare the output
> with the reference output (e.g., exp4/results/output.txt).
> There is a script (verification/testreport) that does that for all 
> or a sub-set of experiment and might not be too difficult to use
> (testreport -h for a list of option).
> b) you could try to lower the level of compiler optimisation.
> default is "-O2" (from linux_amd64_ifort11 optfile); you could try with
> "-O1" (it will run slower) and "-O0" (even slower).
> If "-O0" fixes the problem, then we should try to find which 
> routine cause the problem and just compile this one with "-O0"
> (since "-O0" for all src code is far too slow).
> 
> 3) An other source ot problem could be the code itself. This is not
> very likely with most standard options and pkgs (since they are
> tested on a regular basis) but can definitively happen.
> a) you can check if it's due to a tiling problem or MPI problem,
> simply by running with same sNx but decreasing nPx while
> increasing nSx (to maintain the same number of tiles = nSx*nPx).
> If you compile with "#define GLOBAL_SUM_SEND_RECV" in CPP_EEOPTIONS.h,
> (slower, but make the "global-sum" results independent of processors 
>  number but still dependent on domain tiling) and run the 2 cases 
> (with different nPx), you could expect to get the same results.
> b) if all previous suggestions do not help, you could provide a 
> copy of you set-up (checkpoint64u is fairly recent) so that we will
> try to reproduce it. Could start with your customized code dir
> and set of parameters files (data*).
> 
> Cheers,
> Jean-Michel
> 
> On Thu, Jan 22, 2015 at 09:02:18PM +0900, Noriyuki Yamamoto wrote:
>> Hi all,
>> 
>> I'm running into a problem with MPI run.
>> Outputs from mpi and no-mpi run differ qualitatively.
>> 
>> This seems to be the same problem reported in "Independ Tiling"
>> thread (http://forge.csail.mit.edu/pipermail/mitgcm-support/2014-March/009017.html).
>> Is there any progress about it?
>> If not, I hope this information will add some clues to fixing up.
>> 
>> The model has a zonally periodic channel and is forced by westerly
>> wind and temperature restoring along the northen and southern wall
>> in mid-high latitude.
>> I tested some cases with different topography.
>> In flat-bottomed case with no-mpi, baroclinic instability develops
>> and cascades up to larger scales.
>> But with mpi, the zonal wavenumber of baroclinic instability is
>> fixed to nPx during the 3000-day integration (I tested nPx = 16, 20)
>> and doesn't cascade up.
>> In zonally wavy (sinusoidal wave whose k is not nPx) topography case
>> with mpi using the same mpi executable file (compiled by genmake
>> -mpi) as flat-bottomed case,
>> baroclinic instablity cascades up and the result seems to be similar
>> to that of no-mpi run though I checked only early surface
>> temperature distribution.
>> 
>> In MPI run I tried two patterns of (nPx, nPy) = (16, 4), (20,4).
>> I compiled MITgcm codes by Intel compiler 13.1.3 with Cray MPI
>> library 6.3.0 on SUSE Linux Enterprise Server 11 (x86_64).
>> MITgcm version is checkpoint64u (sorry that it's no latest version).
>> If necessary, I will attach data and SIZE.h files later.
>> 
>> Sorry for my poor English.
>> Noriyuki.
>> 
>> -- 
>> Noriyuki Yamamoto
>> PhD Student - Physical Oceanography Group
>> Division of Earth and Planetary Sciences,
>> Graduate School of Science, Kyoto University.
>> Mail:nymmto at kugi.kyoto-u.ac.jp
>> Tel:+81-75-753-3924
>> 
>> 
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
> 



More information about the MITgcm-support mailing list