[MITgcm-devel] global sum

Mon Nov 26 11:53:47 EST 2012

I guess the answer is "no". This will not work. I have to think about this some more, but if you have suggestions, I'd be happy to hear them.

Martin

On Nov 26, 2012, at 5:50 PM, Martin Losch wrote:

> And I need to store dtempTile in a local common block, right?
> 
> M.
> 
> On Nov 26, 2012, at 5:48 PM, Martin Losch wrote:
> 
>> Hi Jean-Michel,
>> 
>> in the context of seaice_fgmres.F (S/R scalprod beginning at line 551) I then need to define a local array dtemptile(nsx,nsy) and pass bi,bj down to this routine right. 
>> Then I compute dtemptile(bi,bj), as I now compute dtemp and then call
>> CALL GLOBAL_SUM_TILE_RL( dtemptile,dtemp,myThid )
>> 
>> correct?
>> 
>> Martin
>> 
>> On Nov 26, 2012, at 5:40 PM, Jean-Michel Campin wrote:
>> 
>>> Hi Martin,
>>> 
>>> I recommend to use global_sum_tile rather than _GLOBAL_SUM for 2 reasons:
>>> 1) with default CPP_EEOPTIONS.h, it uses the same MPI calls than 
>>> _GLOBAL_SUM (so same speed), but offer the option (with #define GLOBAL_SUM_SEND_RECV),
>>> for a given domain decomposition in tiles (i.e., fixed tile size), 
>>> to get result which is independent of how tiles are distributed among processors
>>> (can change nSx,nSy,nPx,nPy and result stays identical). 
>>> This is a useful for checking that the code is right (but is slower).
>>> 2) it's easier to use because it "always" works, whether or not argument are
>>> shared (e.g., in common bloc) or are local. By contrast, _GLOBAL_SUM
>>> does not work with multi-threads if argument is shared (is in a common bloc).
>>> And since this issue is not so obvious to every one, it's easy to forget about it
>>> and get pieces of code which does not work with multi-threads.
>>> 
>>> And regarding global_sum_single_cpu, it's much slower, so it cannot be used 
>>> as the default; and as a consequence, it requires more specific coding (with 
>>> 1 version calling global_sum_single_cpu and a default version calling 
>>> global_sum_tile). But the advantage of this additional coding is to offer
>>> the option to get results independent of domain decomposition in tiles.
>>> 
>>> Cheers,
>>> Jean-Michel
>>> 
>>> On Mon, Nov 26, 2012 at 09:44:33AM +0100, Martin Losch wrote:
>>>> Hi there,
>>>> 
>>>> I didn't follow the development of the global-sum code. Under which circumstances should I use which of these variants:
>>>> _GLOBAL_SUM (as, e.g., in seaice_lsr for the residuals)
>>>> call global_sum_tile
>>>> call global_sum_single_cpu
>>>> 
>>>> Currently, I would like to figure out if I can actually fix the multithreading of the file seaice_fgmres.F
>>>> In there I use "stolen" code, which I don't quite know how to handle. In particular there is a scalar product, that I adjusted to use MPI, but now I would like it to use a global-sum variant.
>>>> 
>>>> Martin
>>>> 
>>>> PS. Related to this file, should I actually use LAPACK routines (actually BLAS), if HAVE_LAPACK is defined?
>>>> _______________________________________________
>>>> MITgcm-devel mailing list
>>>> MITgcm-devel at mitgcm.org
>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>> 
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>> 
>> 
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
> 
> 
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel