[MITgcm-devel] [MITgcm-support] SOLVE_DIAGONAL_LOWMEMORY
Martin Losch
Martin.Losch at awi.de
Fri Jan 8 11:20:29 EST 2021
OK, of course, I forgot about implicitDiffusion and viscosity.
I was trying to make the LOWMEMORY code work with the adjoint. I am down to 5 recomputation warnings for solve_pentadiagonal and 2 for solve_tridiagonal, but I don’t seem to get the same results (for the adjoint) as with the default (for the adjoint), so I was trying to get this tested.
Is this something foolish to do?
Martin
> On 8. Jan 2021, at 16:18, Jean-Michel Campin <jmc at mit.edu> wrote:
>
> Hi Martin,
>
> I see that SOLVE_DIAGONAL_LOWMEMORY is defined in shelfice_2d_remesh and looks like to me it's used
> there: in shelfice_2d_remesh/input/data:
> implicitDiffusion = .TRUE.,
> implicitViscosity = .TRUE.,
> selectImplicitDrag = 2,
> I am surprised that it's not defined in global_ocean.cs32x15/code/CPP_OPTIONS.h since I found,
> in global_ocean.cs32x15/input.thsice/data, the 3 parameters listed above are set the same way,
> and in this case it would be more efficient (skipping half of the 2nd solve) but I guess it's
> also a way to test selectImplicitDrag=2 without SOLVE_DIAGONAL_LOWMEMORY.
>
> And currently, I am not expecting SOLVE_DIAGONAL_LOWMEMORY to work for the adjoint,
> so it's not clear why I would try to turn it on for global_ocean.cs32x15/input_ad
>
> Cheers,
> Jean-Michel
>
> On Fri, Jan 08, 2021 at 03:01:29PM +0100, Martin Losch wrote:
>> Hi Jean-Michel,
>>
>> we do not seem to be testing the SOLVE_DIAGONAL_LOWMEMORY code anywhere (it???s just compiled in shelfice2d_remesh) and I cannot use it in some of the experiments that will actually use solve_tridiagonal/pentadiagonal, because either it is not allowed (advect_xz.nlfs) or the model explodes (global_ocean.cs32x15/input_ad).
>>
>> How do you test it?
>>
>> Martin
>>
>>> On 8. Jan 2021, at 13:06, Martin Losch <Martin.Losch at awi.de> wrote:
>>>
>>> Hi Dimitris,
>>>
>>> Jean-Michel is right, I added the third option, which is now the default. Looking at the code today tells me, that I have learned a few things since then. I guess the performance difference between default and SOLVE_DIAGONAL_LOWMEMORY is probably due excessive if-statements in the innermost loops.
>>>
>>> I have tried to pushed these out of the i/j loops and this even reduces the number of recomputation warnings from 5 to 1 for this routine. Maybe you would like to give this a try in terms of performance? It here:
>>> https://github.com/mjlosch/MITgcm/tree/diagonal_lowmemory
>>>
>>> For just forward simulations, the LOWMEMORY option is probably even faster as Jean-Michel says. Maybe I can also implement Jean-Michel???s trick to save even more computations.
>>>
>>> Martin
>>>
>>>> On 8. Jan 2021, at 04:20, Jean-Michel Campin <jmc at mit.edu> wrote:
>>>>
>>>> Hi Dimitris,
>>>>
>>>> In term of efficiency, I would rank #define SOLVE_DIAGONAL_KINNER as the slowest,
>>>> and then comes the default (just because accessing more 3-D arrays might take more time)
>>>> and finally the #define SOLVE_DIAGONAL_LOWMEMORY ; but I would have guessed that the differences
>>>> would have been small between the last 2 options.
>>>> Now that Dan is reporting significant differences, I am still not sure if the magnitude of
>>>> the improvement is not platform/problem dependent.
>>>>
>>>> And just for history (please correct me if I am wrong):
>>>> The original version was the SOLVE_DIAGONAL_LOWMEMORY . It's not great for the adjoint
>>>> (there are some obvious reasons, but I thought it could have been fixed without major changes
>>>> in the inner part of the routine) so Gael wrote a new version, SOLVE_DIAGONAL_KINNER,
>>>> that is better for the adjoint, but slower and terrible in term of efficiency on vector machine.
>>>> After that the current default version was introduce (may be Martin did it ?) so that
>>>> we would have an adjointable version that is efficient on vector processor.
>>>> And after that, I messed-up even more the SOLVE_DIAGONAL_LOWMEMORY version to skip
>>>> a good half of the inversion computation if/when it's called for the second time
>>>> with the same matrix but different RHS.
>>>>
>>>> Cheers,
>>>> Jean-Michel
>>>>
>>>> On Thu, Jan 07, 2021 at 11:45:27AM -0800, Dimitris Menemenlis wrote:
>>>>> Bonjour Jean-Michel, would you be aware of any possible issues with switching the hi-res LLC simulations to SOLVE_DIAGONAL_LOWMEMORY ?
>>>>> Right now we use the default settings of:
>>>>>
>>>>> C o Choices for implicit solver routines solve_*diagonal.F
>>>>> C The following has low memory footprint, but not suitable for AD
>>>>> #undef SOLVE_DIAGONAL_LOWMEMORY
>>>>> C The following one suitable for AD but does not vectorize
>>>>> #undef SOLVE_DIAGONAL_KINNER
>>>>>
>>>>> But Dan notes that the low-memory variant is approximately twice as fast than the default settings for the llc_540 set-up.
>>>>> I could not find any previous discussion of SOLVE_DIAGONAL_LOWMEMORY in mitgcm-support.
>>>>>
>>>>> Merci, Dimitris
>>>>>
>>>>>
>>>>>> Begin forwarded message:
>>>>>>
>>>>>> From: "Kokron, Daniel S. (ARC-606.2)[InuTeq, LLC]" <daniel.s.kokron at nasa.gov>
>>>>>> Subject: Re: Performance analysis feedback
>>>>>> Date: January 7, 2021 at 10:49:23 AM PST
>>>>>> To: Dimitris Menemenlis <menemenlis at jpl.nasa.gov>
>>>>>> Cc: "Zhang, Hong (JPL-398K)[JPL Employee]" <hong.zhang at jpl.nasa.gov>
>>>>>>
>>>>>> Kinner 147.07/186.45
>>>>>> Fall though 60.37/125.83
>>>>>> Low mem 26.1/67.9
>>>>>>
>>>>>> See attached. Upper left is Kinner, upper right is fall through and lower middle is low mem. solve_tridiagonal() is highlighted in yellow.
>>>>>> Dan
>>>>>>
>>>>>> From: "Kokron, Daniel S. (ARC-606.2)[InuTeq, LLC]" <daniel.s.kokron at nasa.gov>
>>>>>> Date: Thursday, January 7, 2021 at 10:05 AM
>>>>>> To: Dimitris Menemenlis <menemenlis at jpl.nasa.gov>
>>>>>> Cc: "Zhang, Hong (JPL-398K)[JPL Employee]" <hong.zhang at jpl.nasa.gov>
>>>>>> Subject: Re: Performance analysis feedback
>>>>>>
>>>>>> Comparing profiles with and without Kinner, Kinner is definitely slower than the fall-through path. I did not try using the lowmem path.
>>>>>>
>>>>>> There is a lot of load imbalance among the ranks so I???ve included min and max times (s) spent in the solve_tridiagonal() routine using the llc_540 case run on 767 ranks.
>>>>>>
>>>>>> Kinner 147.07/186.45
>>>>>> Fall though 60.37/125.83
>>>>>>
>>>>>> Fall though is activated with
>>>>>> #undef SOLVE_DIAGONAL_LOWMEMORY
>>>>>> #undef SOLVE_DIAGONAL_KINNER
>>>>>>
>>>>>>
>>>>>> From: Dimitris Menemenlis <menemenlis at jpl.nasa.gov>
>>>>>> Date: Wednesday, January 6, 2021 at 7:57 PM
>>>>>> To: "Kokron, Daniel S. (ARC-606.2)[InuTeq, LLC]" <daniel.s.kokron at nasa.gov>
>>>>>> Cc: "Zhang, Hong (JPL-398K)[JPL Employee]" <hong.zhang at jpl.nasa.gov>
>>>>>> Subject: Re: Performance analysis feedback
>>>>>>
>>>>>> actually, it looks like we are ???not??? using the low-memory option in any of our set-ups:
>>>>>>
>>>>>> bash-3.2$ grep SOLVE_DIAGONAL_LOW */*/code*/*h */code*/*h
>>>>>> llc_540/tides_exp/code/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>>>>> llc_90/tides_exps/code/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>>>>> llc_1080/code/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>>>>> llc_2160/code/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>>>>> llc_270/code/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>>>>> llc_270/code_ad/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>>>>> llc_4320/code/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>>>>> llc_540/code/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>>>>> llc_90/code-async-noseaice/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>>>>> llc_90/code/CPP_OPTIONS.h:#undef SOLVE_DIAGONAL_LOWMEMORY
>>>>>>
>>>>>> which happens to be the default for unmodified MITgcm CPP_OPTIONS.h
>>>>>>
>>>>>> bash-3.2$ grep SOLVE_DIAGONAL CPP_OPTIONS.h
>>>>>> #undef SOLVE_DIAGONAL_LOWMEMORY
>>>>>> #undef SOLVE_DIAGONAL_KINNER
>>>>>>
>>>>>> what do you recommend?
>>>>>>
>>>>>> Dimitris
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Jan 6, 2021, at 3:03 PM, Kokron, Daniel S. (ARC-606.2)[InuTeq, LLC] <daniel.s.kokron at nasa.gov> wrote:
>>>>>>>
>>>>>>> Hong,
>>>>>>> The Kinner tri-diagonal solver code path is showing up in my profiling of the llc_540 case. None of the other user cases I have is using this path. Does your investigation require using the Kinner path?
>>>>>>> Dan
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>> _______________________________________________
>>>>> MITgcm-support mailing list
>>>>> MITgcm-support at mitgcm.org
>>>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>>>>
>>>> _______________________________________________
>>>> MITgcm-support mailing list
>>>> MITgcm-support at mitgcm.org
>>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>>>
>>> _______________________________________________
>>> MITgcm-support mailing list
>>> MITgcm-support at mitgcm.org
>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>>
>
>
>
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1665 bytes
Desc: not available
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-devel/attachments/20210108/c8080071/attachment-0001.p7s>
More information about the MITgcm-devel
mailing list