[MITgcm-support] Domain decompositions affecting simulations outcome

Martin Losch Martin.Losch at awi.de
Tue May 21 01:45:45 EDT 2024


Hi Fabio,

thanks for these tests. I am not sure if I can help you any further with this, as you have already figured out all the decomposition related issues:

1. non-hydrostatic code (cg3d.F) needs to include the CG2D_SINGLECPU_SUM flag for repoducible results (possible, but not yet done)

2. the obcs-sponge-code is not tile proof, and we may want to say this somewhere in the documentation. The (most likely) reason is that a given tile has an identifier if it contains an open boundary and only if that’s true, the obcs-operations are carried out (including the sponge). With your sNy=12, the second tile is no longer an obcs-tile and is not touched. I guess you can call this a bug. sNy<30, however, may not be a good choice from a performance point of view as the MPI and overlap overhead becomes large compared to the tile size. My suggestion here is to make sure that the sponge layer width is smaller than the tile size. We could add a warning or even a stop, if one chooses a too large sponge (compared to sNx/y). It would be nice if you could raise an issue on GitHub.

3. in a chaotic simulation (i.e. small differences in inital conditions change the simulation dramatically), **any** small difference (initial conditions or of numerical nature) leads to very different results; this is a very general observation/experience. The statistics (mean, standard deviation, and also higher moments) of the results should still be “the same”. In this sense your observation of figure4 is no surprise.


I still don’t know if the pkg/exch2 vs eessupp/src/exch code differences are expected, maybe other can chime in.

Martin

> On 20. May 2024, at 16:57, Giordano, Fabio <fgiordano at ogs.it> wrote:
> 
> Hi Martin,
> Thanks for the very useful reply.
> As regards your question ("I am assuming that the dotted lines are the same domain decompositions with the blank tiles excluded (exch2?)."), yes, the dotted lines refer exactly to the same setups, but with the EXCH2 package activated.
> 
> I ran some more tests. As you suggested, I tried to change the flags in eesupp/inc/CPP_EEOPTIONS.h; however, as it is written in the documentation (https://mitgcm.readthedocs.io/en/latest/getting_started/getting_started.html#preprocessor-execution-environment-options), these updates did not solve the problem (as shown in Figure 1), since my runs are non-hydrostatic, therefore using the 3D conjugate gradient solver. Moreover, as you noticed, the simulation is more than 10 times slower, so the option is not feasible.
> 
> I also tried to look at higher frequency outputs to better detect when and where the issue starts. I tested 4 runs, keeping the same number of tiles along the longitude (nPx = 26), as follows:
>  1: 260 processes, with 10 tiles in latitude and therefore sNx  = 30;
>  2: 390 processes, with 15 tiles in latitude, sNx = 20;
>  3: 520 processes, with 20 tiles in latitude, sNx = 15;
>  4: 650 processes, with 25 tiles in latitude, sNx = 12.
> In Figures 2-4 I plotted maps of the differences between them (hourly diagnostics).
> Figure 2 highlights an issue with the OBCS package: in all the simulations a sponge layer of 15 cells of width is active at the southern boundary, where a strip of non-zero values appears at the inner boundary of the sponge layer.
> In simulations where the tile dimension is larger than the sponge layer thickness, these discrepancies arise in the domain interior (Figure 3). In other words, the problem at the boundary originates earlier and somehow overwrites the issue in the domain interior, but, at the end of the story, the result is qualitatively the same (Figure 4), with very remarkable differences after some days of simulation (e.g. if we want to compare SST with satellite observations).
> 
> I also did a year-long test to check the long-term behaviour of the differences: as expected, they do not diverge, keeping within the same range observed in the monthly tests.
> Please let me know if you have other suggestions for  further tests, or if we simply have to take into account this intrinsic numerical "variability" (therefore preferring longer-term averages when comparing model outputs).
> 
> Thank you
> Sincerely,
> Fabio Giordano
> Sezione di Oceanografia
> Istituto Nazionale di Oceanografia e di Geofisica Sperimentale - OGS
> via Beirut n. 2
> 34151 Trieste - Italia
> <Figure3.png><Figure1.png><Figure2.png><Figure4.png>_______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20240521/487bdc60/attachment.html>


More information about the MITgcm-support mailing list