[MITgcm-devel] further vectorization

Martin Losch Martin.Losch at awi.de
Thu Nov 1 12:07:30 EDT 2007


That's OK for me too, but why not move the barrier into the if-block?  
do we need to wait for every thread before executing "if  
(dosinglethreadedcomms)"?

M.
On 31 Oct 2007, at 16:32, Jean-Michel Campin wrote:

> Hi Martin,
>
> Chris is looking at your suggestions.
> Regarding the BARRIER thing, if was wandering if something
> like:
>       IF ( nSx.NE.1 .OR. nSy.NE.1 ) THEN
>        _BARRIER
>       ENDIF
> would do it. The compiler should know (since nSx & nSy are
> parameters) that he can remove those barrier when it's safe
> (both nSx & nSy = 1).
> And when nSx > 1 or nSy > 1 , we can still use the same
> executable for single-thread or multi-threads run
> just by changing eedata.
>
> Jean-Michel
>
> On Wed, Oct 31, 2007 at 08:59:08AM +0100, Martin Losch wrote:
>> Hi all,
>>
>> Jens-Olaf has identified (and fixed) another (small) bottleneck in
>> exch_rl_recv_get_x and exch_rl_send_put_x (and all the other files
>> that are created from the corresponding template):
>>
>> The problem: the inner loop is always over i, but for the *_x
>> routines this loop is very short (basically Olx). Because the loop
>> boundaries are not available at compile time (iMin and iMax are set
>> earlier in the routine), only the inner loop is vectorized, resulting
>> in slow code (vectorization is at 20%): the routines are among the 20
>> most expensive ones.
>> This is his suggestion (2 instances, for east and west buffers):
>>>          DO K=1,myNz
>>> !CDIR NOLOOPCHG
>>>           DO I=iMin,iMax
>>>            DO J=1,sNy
>>>             iB = iB + 1
>>>             array(I,J,K,bi,bj) = eastRecvBuf_RL(iB,eBl,bi,bj)
>>>            ENDDO
>>>           ENDDO
>>>          ENDDO
>> that is, exchange the loop order and add a directive that the
>> compiler does not change the order back. I would suggest to put the
>> directive into #ifdef TARGET_NEC_SX/#endif
>>
>> Also, at the end of the routine there are two barrier calls (each
>> call costs about 30% of routine runtime). Can these be moved into the
>> IF Block like this?
>>
>>> c     _BARRIER
>>>      IF ( doingSingleThreadedComms ) THEN
>>>     _BARRIER
>>> C      Restore saved settings that were stored to allow
>>> C      single thred comms.
>>>       _BEGIN_MASTER(myThid)
>>>        DO I=1,nThreads
>>>         myBxLo(I) = myBxLoSave(I)
>>>         myBxHi(I) = myBxHiSave(I)
>>>         myByLo(I) = myByLoSave(I)
>>>         myByHi(I) = myByHiSave(I)
>>>        ENDDO
>>>       _END_MASTER(myThid)
>>>     _BARRIER
>>>      ENDIF
>>> c     _BARRIER
>>
>> If you agree with these changes I will implement and test them. I am
>> asking, becaues I do not feel too comfortable with this part of the
>> code. Please let me know.
>>
>> (There is probably something similar in the corresponding exch2
>> routines, but I haven't tried that. yet.)
>>
>> Martin
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel




More information about the MITgcm-devel mailing list