[MITgcm-support] Domain decompositions affecting simulations outcome

Martin Losch Martin.Losch at awi.de
Tue May 14 02:52:21 EDT 2024


Hi Fabio,

thanks for your clear illustration of the problem. I am afraid that these differences can really be due to the domain decomposition, because the domain decomposition can be change the order of summation global sums in the global pressure solver. 
In eesupp/inc/CPP_EEOPTIONS.h you will find this:

> C--   Always cumulate tile local-sum in the same order by applying MPI allreduce
> C     to array of tiles ; can get slower with large number of tiles (big set-up)
> #define GLOBAL_SUM_ORDER_TILES
> 
> C--   Alternative way of doing global sum without MPI allreduce call
> C     but instead, explicit MPI send & recv calls. Expected to be slower.
> #undef GLOBAL_SUM_SEND_RECV
> 
> C--   Alternative way of doing global sum on a single CPU
> C     to eliminate tiling-dependent roundoff errors. Note: This is slow.
> #undef  CG2D_SINGLECPU_SUM

You can try to run with CG2D_SINGLECPU_SUM defined to make sure that the summation does not change, but this may significantly slow down your runs.

I am assuming that the dotted lines are the same domain decompositions with the blank tiles excluded (exch2?). 
I am suprised that that also gives different results, and I assume that also this changes the order of summation a little bit although it is all about summing zeros.

Martin

> On 13. May 2024, at 18:30, Giordano, Fabio <fgiordano at ogs.it> wrote:
> 
> Dear MITgcm community,
> I am a PhD student at the National Institute of Oceanography and Applied Geophysics – OGS (Italy).
> I am running a batch of simulations for the northern Adriatic Sea (a semi-enclosed basin in the Mediterranean) and I am testing some possible domain (made of 494 cells in the x-direction, 300 in y-, 1/128° resolution) decompositions getting, unexpectedly, slightly different results. I am using the HPC infrastructure G100 (https://www.hpc.cineca.it/systems/hardware/galileo100/).
> I did some tests, simulating just 30 days, starting from January 1st 2020, with atmospheric forcing from a Limited Area Model (2.2 km resolution) and initial and boundary conditions from the EU Copernicus Marine Service Mediterranean Reanalysis.
> I ran 8 different setups, changing only the domain decomposition (same forcing, parameters, namelists etc.): I chose 4 different tilings and then doubled the experiments by running them also with the EXCH2 package, obtaining the following setups:
> 95 processors (19 × 5 in x and y respectively) → 72 processors with EXCH2
> 190 processors (19 × 10 in x and y respectively) → 130 processors with EXCH2
> 520 processors (26 × 20 in x and y respectively) → 339 processors with EXCH2
> 760 processors (38 × 20 in x and y respectively) → 474 processors with EXCH2
> The domain and the different decompositions are shown in Figures 1÷4.
> From these setups I plotted in Figure 5 sea surface temperature time series, with daily frequency, for a single cell in the open sea (upper panel), shown by the black dot in Figures 1÷4. I also plotted the horizontal average over the whole upper level (lower panel), which seems to smooth/balance the differences. The plots clearly show different outputs after a few days.
> To check whether it is a machine-dependent issue, I also run the last setup (474 processors) twice and got identical results (identical STDOUTs).
> 
> My question is: did I mess up something or is this an expected behaviour related to the domain decomposition? And if so, what is the reason?
> 
> Thank you very much for the support.
> Kind regards,
> 
> Fabio Giordano
> Sezione di Oceanografia
> Istituto Nazionale di Oceanografia e di Geofisica Sperimentale - OGS
> via Beirut n. 2
> 34151 Trieste - Italia
> 
> <Figure04_760p.png><Figure05_8testsComparison.png><Figure01_095p.png><Figure02_190p.png><Figure03_520p.png>_______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20240514/3d1a9027/attachment.html>


More information about the MITgcm-support mailing list