[MITgcm-devel] global sum

Mon Nov 26 11:50:20 EST 2012

And I need to store dtempTile in a local common block, right?

M.

On Nov 26, 2012, at 5:48 PM, Martin Losch wrote:

> Hi Jean-Michel,
> 
> in the context of seaice_fgmres.F (S/R scalprod beginning at line 551) I then need to define a local array dtemptile(nsx,nsy) and pass bi,bj down to this routine right. 
> Then I compute dtemptile(bi,bj), as I now compute dtemp and then call
> CALL GLOBAL_SUM_TILE_RL( dtemptile,dtemp,myThid )
> 
> correct?
> 
> Martin
> 
> On Nov 26, 2012, at 5:40 PM, Jean-Michel Campin wrote:
> 
>> Hi Martin,
>> 
>> I recommend to use global_sum_tile rather than _GLOBAL_SUM for 2 reasons:
>> 1) with default CPP_EEOPTIONS.h, it uses the same MPI calls than 
>> _GLOBAL_SUM (so same speed), but offer the option (with #define GLOBAL_SUM_SEND_RECV),
>> for a given domain decomposition in tiles (i.e., fixed tile size), 
>> to get result which is independent of how tiles are distributed among processors
>> (can change nSx,nSy,nPx,nPy and result stays identical). 
>> This is a useful for checking that the code is right (but is slower).
>> 2) it's easier to use because it "always" works, whether or not argument are
>> shared (e.g., in common bloc) or are local. By contrast, _GLOBAL_SUM
>> does not work with multi-threads if argument is shared (is in a common bloc).
>> And since this issue is not so obvious to every one, it's easy to forget about it
>> and get pieces of code which does not work with multi-threads.
>> 
>> And regarding global_sum_single_cpu, it's much slower, so it cannot be used 
>> as the default; and as a consequence, it requires more specific coding (with 
>> 1 version calling global_sum_single_cpu and a default version calling 
>> global_sum_tile). But the advantage of this additional coding is to offer
>> the option to get results independent of domain decomposition in tiles.
>> 
>> Cheers,
>> Jean-Michel
>> 
>> On Mon, Nov 26, 2012 at 09:44:33AM +0100, Martin Losch wrote:
>>> Hi there,
>>> 
>>> I didn't follow the development of the global-sum code. Under which circumstances should I use which of these variants:
>>> _GLOBAL_SUM (as, e.g., in seaice_lsr for the residuals)
>>> call global_sum_tile
>>> call global_sum_single_cpu
>>> 
>>> Currently, I would like to figure out if I can actually fix the multithreading of the file seaice_fgmres.F
>>> In there I use "stolen" code, which I don't quite know how to handle. In particular there is a scalar product, that I adjusted to use MPI, but now I would like it to use a global-sum variant.
>>> 
>>> Martin
>>> 
>>> PS. Related to this file, should I actually use LAPACK routines (actually BLAS), if HAVE_LAPACK is defined?
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>> 
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
> 
> 
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel