<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;">Hi Fabio,<div><br></div><div>thanks for these tests. I am not sure if I can help you any further with this, as you have already figured out all the decomposition related issues:</div><div><br></div><div>1. non-hydrostatic code (cg3d.F) needs to include the CG2D_SINGLECPU_SUM flag for repoducible results (possible, but not yet done)</div><div><br></div><div>2. the obcs-sponge-code is not tile proof, and we may want to say this somewhere in the documentation. The (most likely) reason is that a given tile has an identifier if it contains an open boundary and only if that’s true, the obcs-operations are carried out (including the sponge). With your sNy=12, the second tile is no longer an obcs-tile and is not touched. I guess you can call this a bug. sNy<30, however, may not be a good choice from a performance point of view as the MPI and overlap overhead becomes large compared to the tile size. My suggestion here is to make sure that the sponge layer width is smaller than the tile size. We could add a warning or even a stop, if one chooses a too large sponge (compared to sNx/y). It would be nice if you could raise an issue on GitHub.</div><div><br></div><div>3. in a chaotic simulation (i.e. small differences in inital conditions change the simulation dramatically), **any** small difference (initial conditions or of numerical nature) leads to very different results; this is a very general observation/experience. The statistics (mean, standard deviation, and also higher moments) of the results should still be “the same”. In this sense your observation of figure4 is no surprise.</div><div><br></div><div><br></div><div>I still don’t know if the pkg/exch2 vs eessupp/src/exch code differences are expected, maybe other can chime in.</div><div><br></div><div>Martin<br id="lineBreakAtBeginningOfMessage"><div><br><blockquote type="cite"><div>On 20. May 2024, at 16:57, Giordano, Fabio <fgiordano@ogs.it> wrote:</div><br class="Apple-interchange-newline"><div><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr"><div><div><div><div><div>Hi Martin,</div><div>Thanks for the very useful reply.<br></div><div>As regards your question ("I am assuming that the dotted lines are the same domain decompositions with the blank tiles excluded (exch2?)."), yes, the dotted lines refer exactly to the same setups, but with the EXCH2 package activated.<br><br></div>I ran some more tests. As you suggested, I tried to change the flags in eesupp/inc/CPP_EEOPTIONS.h; however, as it is written in the documentation (<a href="https://mitgcm.readthedocs.io/en/latest/getting_started/getting_started.html#preprocessor-execution-environment-options">https://mitgcm.readthedocs.io/en/latest/getting_started/getting_started.html#preprocessor-execution-environment-options</a>), these updates did not solve the problem (as shown in Figure 1), since my runs are non-hydrostatic, therefore using the 3D conjugate gradient solver. Moreover, as you noticed, the simulation is more than 10 times slower, so the option is not feasible.<br><br></div><div>I also tried to look at higher frequency outputs to better detect when and where the issue starts. I tested 4 runs, keeping the same number of tiles along the longitude (nPx = 26), as follows:</div><div> 1: 260 processes, with 10 tiles in latitude and therefore sNx = 30;</div><div> 2: 390 processes, with 15 tiles in latitude, sNx = 20;</div><div> 3: 520 processes, with 20 tiles in latitude, sNx = 15;</div><div> 4: 650 processes, with 25 tiles in latitude, sNx = 12.<br></div><div>In Figures 2-4 I plotted maps of the differences between them (hourly diagnostics).<br>Figure 2 highlights an issue with the OBCS package: in all the simulations a sponge layer of 15 cells of width is active at
the southern boundary, where a strip of non-zero values appears at the inner boundary of the sponge layer. <br></div><div>In simulations where the tile dimension is larger than the sponge layer thickness, these discrepancies arise in the domain interior (Figure 3). In other words, the problem at the boundary originates earlier and somehow overwrites the issue in the domain interior, but, at the end of the story, the result is qualitatively the same (Figure 4), with very remarkable differences after some days of simulation (e.g. if we want to compare SST with satellite observations).<br></div><br></div>I also did a year-long test to check the long-term behaviour of the differences: as expected, they do not diverge, keeping within the same range observed in the monthly tests.<br></div>Please let me know if you have other suggestions for further tests, or if we simply have to take into account this intrinsic numerical "variability" (therefore preferring longer-term averages when comparing model outputs).<br></div><div><br></div><div>Thank you<br></div>Sincerely,<br><div><p lang="en-GB" style="line-height:100%;margin-bottom:0in;background:transparent"><font face="monospace">Fabio Giordano</font><font color="#888888"></font></p><div style="color:rgb(34,34,34)"><font color="#888888"><font face="monospace">Sezione di Oceanografia</font></font></div><div><font color="#888888"><font face="monospace"><font color="#222222">Istituto Nazionale di Oceanografia e di Geofisica Sperimentale - OGS</font><br></font></font></div><div style="color:rgb(34,34,34)"><font color="#888888"><font face="monospace">via Beirut n. 2</font></font></div><div style="color:rgb(34,34,34)"><font color="#888888"><font face="monospace">34151 Trieste - Italia</font></font></div></div></div>
<span id="cid:f_lwf39em02"><Figure3.png></span><span id="cid:f_lwf39elo0"><Figure1.png></span><span id="cid:f_lwf39elw1"><Figure2.png></span><span id="cid:f_lwf39em33"><Figure4.png></span>_______________________________________________<br>MITgcm-support mailing list<br>MITgcm-support@mitgcm.org<br>http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support<br></div></blockquote></div><br></div></body></html>