[MITgcm-support] SOLVE_DIAGONAL_LOWMEMORY

Martin Losch Martin.Losch at awi.de
Fri Jan 8 07:06:28 EST 2021


Hi Dimitris,

Jean-Michel is right, I added the third option, which is now the default. Looking at the code today tells me, that I have learned a few things since then. I guess the performance difference between default and SOLVE_DIAGONAL_LOWMEMORY is probably due excessive if-statements in the innermost loops.

I have tried to pushed these out of the i/j loops and this even reduces the number of recomputation warnings from 5 to 1 for this routine. Maybe you would like to give this a try in terms of performance? It here:
https://github.com/mjlosch/MITgcm/tree/diagonal_lowmemory

For just forward simulations, the LOWMEMORY option is probably even faster as Jean-Michel says. Maybe I can also implement Jean-Michel’s trick to save even more computations.

Martin

> On 8. Jan 2021, at 04:20, Jean-Michel Campin <jmc at mit.edu> wrote:
> 
> Hi Dimitris,
> 
> In term of efficiency, I would rank #define SOLVE_DIAGONAL_KINNER as the slowest,
> and then comes the default (just because accessing more 3-D arrays might take more time)
> and finally the #define SOLVE_DIAGONAL_LOWMEMORY ; but I would have guessed that the differences
> would have been small between the last 2 options. 
> Now that Dan is reporting significant differences, I am still not sure if the magnitude of
> the improvement is not platform/problem dependent.
> 
> And just for history (please correct me if I am wrong):
> The original version was the SOLVE_DIAGONAL_LOWMEMORY . It's not great for the adjoint
> (there are some obvious reasons, but I thought it could have been fixed without major changes
> in the inner part of the routine) so Gael wrote a new version, SOLVE_DIAGONAL_KINNER,
> that is better for the adjoint, but slower and terrible in term of efficiency on vector machine.
> After that the current default version was introduce (may be Martin did it ?) so that 
> we would have an adjointable version that is efficient on vector processor.
> And after that, I messed-up even more the SOLVE_DIAGONAL_LOWMEMORY version to skip
> a good half of the inversion computation if/when it's called for the second time
> with the same matrix but different RHS.
> 
> Cheers,
> Jean-Michel
> 
> On Thu, Jan 07, 2021 at 11:45:27AM -0800, Dimitris Menemenlis wrote:
>> Bonjour Jean-Michel, would you be aware of any possible issues with switching the hi-res LLC simulations to SOLVE_DIAGONAL_LOWMEMORY ?
>> Right now we use the default settings of:
>> 
>> C o Choices for implicit solver routines solve_*diagonal.F
>> C   The following has low memory footprint, but not suitable for AD
>> #undef SOLVE_DIAGONAL_LOWMEMORY
>> C   The following one suitable for AD but does not vectorize
>> #undef SOLVE_DIAGONAL_KINNER
>> 
>> But Dan notes that the low-memory variant is approximately twice as fast than the default settings for the llc_540 set-up.
>> I could not find any previous discussion of SOLVE_DIAGONAL_LOWMEMORY in mitgcm-support.
>> 
>> Merci, Dimitris
>> 
>> 
>>> Begin forwarded message:
>>> 
>>> From: "Kokron, Daniel S. (ARC-606.2)[InuTeq, LLC]" <daniel.s.kokron at nasa.gov>
>>> Subject: Re: Performance analysis feedback
>>> Date: January 7, 2021 at 10:49:23 AM PST
>>> To: Dimitris Menemenlis <menemenlis at jpl.nasa.gov>
>>> Cc: "Zhang, Hong (JPL-398K)[JPL Employee]" <hong.zhang at jpl.nasa.gov>
>>> 
>>> Kinner                   147.07/186.45
>>> Fall though          60.37/125.83
>>> Low mem            26.1/67.9
>>> 
>>> See attached.  Upper left is Kinner, upper right is fall through and lower middle is low mem.  solve_tridiagonal() is highlighted in yellow.
>>> Dan
>>> 
>>> From: "Kokron, Daniel S. (ARC-606.2)[InuTeq, LLC]" <daniel.s.kokron at nasa.gov>
>>> Date: Thursday, January 7, 2021 at 10:05 AM
>>> To: Dimitris Menemenlis <menemenlis at jpl.nasa.gov>
>>> Cc: "Zhang, Hong (JPL-398K)[JPL Employee]" <hong.zhang at jpl.nasa.gov>
>>> Subject: Re: Performance analysis feedback
>>> 
>>> Comparing profiles with and without Kinner, Kinner is definitely slower than the fall-through path.  I did not try using the lowmem path.
>>> 
>>> There is a lot of load imbalance among the ranks so I???ve included min and max times (s) spent in the solve_tridiagonal() routine using the llc_540 case run on 767 ranks.
>>> 
>>> Kinner                   147.07/186.45
>>> Fall though          60.37/125.83
>>> 
>>> Fall though is activated with
>>> #undef SOLVE_DIAGONAL_LOWMEMORY
>>> #undef SOLVE_DIAGONAL_KINNER
>>> 
>>> 
>>> From: Dimitris Menemenlis <menemenlis at jpl.nasa.gov>
>>> Date: Wednesday, January 6, 2021 at 7:57 PM
>>> To: "Kokron, Daniel S. (ARC-606.2)[InuTeq, LLC]" <daniel.s.kokron at nasa.gov>
>>> Cc: "Zhang, Hong (JPL-398K)[JPL Employee]" <hong.zhang at jpl.nasa.gov>
>>> Subject: Re: Performance analysis feedback
>>> 
>>> actually, it looks like we are ???not??? using the low-memory option in any of our set-ups:
>>> 
>>> bash-3.2$ grep SOLVE_DIAGONAL_LOW */*/code*/*h */code*/*h 
>>> llc_540/tides_exp/code/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>> llc_90/tides_exps/code/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>> llc_1080/code/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>> llc_2160/code/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>> llc_270/code/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>> llc_270/code_ad/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>> llc_4320/code/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>> llc_540/code/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>> llc_90/code-async-noseaice/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>> llc_90/code/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>> 
>>> which happens to be the default for unmodified MITgcm CPP_OPTIONS.h
>>> 
>>> bash-3.2$ grep SOLVE_DIAGONAL CPP_OPTIONS.h 
>>> #undef SOLVE_DIAGONAL_LOWMEMORY
>>> #undef SOLVE_DIAGONAL_KINNER
>>> 
>>> what do you recommend?
>>> 
>>> Dimitris
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On Jan 6, 2021, at 3:03 PM, Kokron, Daniel S. (ARC-606.2)[InuTeq, LLC] <daniel.s.kokron at nasa.gov> wrote:
>>>> 
>>>> Hong,
>>>> The Kinner tri-diagonal solver code path is showing up in my profiling of the llc_540 case.  None of the other user cases I have is using this path.  Does your investigation require using the Kinner path?
>>>> Dan
>>>> 
>>> 
>>> 
>> 
> 
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1665 bytes
Desc: not available
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20210108/505efab2/attachment.p7s>


More information about the MITgcm-support mailing list