[MITgcm-devel] [EXTERNAL] pickups

Oliver Jahn jahn at mit.edu
Tue Jun 9 11:30:44 EDT 2020


Hi Martin,

it may not help if you only have one kind of CPU, but if you have, say,
 both Haswell and Broadwell, I would add it, and make sure you have
either "-fp-model source" or "-fp-model precise".

Oliver


On 2020-06-09 10:41, Martin Losch wrote:
> Hi Oliver,
> 
> thanks! So do I get this right: I should probably add -xAVX to the builds_options files for ollie with ifort?
> 
> Martin
> 
>> On 9. Jun 2020, at 16:00, Oliver Jahn <jahn at mit.edu> wrote:
>>
>> Hi Martin,
>>
>> on our cluster, I've noticed non-exact restarts when using instruction
>> sets > AVX with the intel compiler, i.e., AVX2, AVX512, etc.
>> Optimization level was only -O2.  On Skylake these happened even between
>> runs on the same node.  I believe on Haswell and Broadwell, it only
>> happened when running on a different kind of cpu.  There are compiler
>> options that are supposed to help with this (-fltconsistency, -fp-model
>> consistent, ...), but the only way I could resolve it for the intel 2018
>> compiler was to always specify -xAVX (and "-fp-model source").  With
>> this, I've been getting perfect restarts even with -O3, and across
>> Sandybridge, Haswell, Broadwell and, I believe, also Skylake.
>>
>> I agree with Dimitris that switching compilers and operation system
>> updates often cause the same problem.
>>
>> Oliver
>>
>>
>> On 2020-06-09 08:12, Martin Losch wrote:
>>> to be clear, we use the same executable, maybe not on the same nodes.
>>>
>>> So Dimitris, you are saying, that this is quite normal?
>>>
>>> Martin
>>>
>>>> On 9. Jun 2020, at 14:10, Dimitris Menemenlis <dmenemenlis at gmail.com> wrote:
>>>>
>>>> Hi Martin, I don’t have experience with Cray but on pleiades supercomputer (https://www.nas.nasa.gov/hecc/resources/pleiades.html) this non-repeatabilty for eddying MITgcm configurations is pretty common.  I am pretty sure that I have seen it occur when:
>>>> - we use maximum optimization, which does not guarantee that precise order of operations will always be the same,
>>>> - we use recompiled code with different compilers,
>>>> - we use the same executable but on different nodes of the machine, e.g., Broadwell instead of Haswell, or
>>>> - we use the same executable and nodes after operating system upgrades.
>>>>
>>>> Dimitris
>>>>
>>>>> On Jun 9, 2020, at 4:56 AM, Martin Losch <Martin.Losch at awi.de> wrote:
>>>>>
>>>>> Hi Jean-Michel and others,
>>>>>
>>>>> on our Cray CS400 ollie, but also elsewhere without the throrough experimenation, we experience the following issue with pickup files. Yuqing has carried out the following experiment with the 4km Arctic configuration (e.g. Gunnar Spreen et al 2017): She ran two model simulations where one uses a pickup frequency of 3 days, i.e. it stops every 3 days and restarts, and the other one uses one of 10 days. If the pickups were perfect these two runs should be the same. Bottom line, they are not, please find attached some time series plots based on daily averages illustrating this:
>>>>>
>>>>> fig 1: RMS difference between 3 day and 10 day pchk of sea ice/ surface ocean diagnostics in Jan 2001.
>>>>>
>>>>> fig 2: RMS difference between 3 day and 10 day pchk of sea ice/ surface ocean diagnostics for the full year of 2001
>>>>>
>>>>> fig 3: Mean difference between 3 day and 10 day pchk of sea ice and ocean diagnostics in Jan 2001
>>>>>
>>>>> So apparantly the two runs diverge until they seem to reach some kind of “steady state” of RMS differences. An RMS difference of 2cm for EtaN is not small, I believe. In this configuration, (which uses seaice, kpp, cal/exf, obcs, salt_plume and exch2), we don’t do anything fancy or experimental.
>>>>>
>>>>> Have you seen something like this before?
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>> <10-3day-pick-up-diff_RMS.png><10-3day-pick-up-diff_RMS_annual.png><10-3day-pick-up-diff.png>
>>>>>
>>>>> _______________________________________________
>>>>> MITgcm-devel mailing list
>>>>> MITgcm-devel at mitgcm.org
>>>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
>>>>
>>>> _______________________________________________
>>>> MITgcm-devel mailing list
>>>> MITgcm-devel at mitgcm.org
>>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
>>>
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
>>>
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
> 
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
> 


More information about the MITgcm-devel mailing list