[MITgcm-devel] further vectorization
Martin Losch
Martin.Losch at awi.de
Thu Nov 1 12:07:30 EDT 2007
That's OK for me too, but why not move the barrier into the if-block?
do we need to wait for every thread before executing "if
(dosinglethreadedcomms)"?
M.
On 31 Oct 2007, at 16:32, Jean-Michel Campin wrote:
> Hi Martin,
>
> Chris is looking at your suggestions.
> Regarding the BARRIER thing, if was wandering if something
> like:
> IF ( nSx.NE.1 .OR. nSy.NE.1 ) THEN
> _BARRIER
> ENDIF
> would do it. The compiler should know (since nSx & nSy are
> parameters) that he can remove those barrier when it's safe
> (both nSx & nSy = 1).
> And when nSx > 1 or nSy > 1 , we can still use the same
> executable for single-thread or multi-threads run
> just by changing eedata.
>
> Jean-Michel
>
> On Wed, Oct 31, 2007 at 08:59:08AM +0100, Martin Losch wrote:
>> Hi all,
>>
>> Jens-Olaf has identified (and fixed) another (small) bottleneck in
>> exch_rl_recv_get_x and exch_rl_send_put_x (and all the other files
>> that are created from the corresponding template):
>>
>> The problem: the inner loop is always over i, but for the *_x
>> routines this loop is very short (basically Olx). Because the loop
>> boundaries are not available at compile time (iMin and iMax are set
>> earlier in the routine), only the inner loop is vectorized, resulting
>> in slow code (vectorization is at 20%): the routines are among the 20
>> most expensive ones.
>> This is his suggestion (2 instances, for east and west buffers):
>>> DO K=1,myNz
>>> !CDIR NOLOOPCHG
>>> DO I=iMin,iMax
>>> DO J=1,sNy
>>> iB = iB + 1
>>> array(I,J,K,bi,bj) = eastRecvBuf_RL(iB,eBl,bi,bj)
>>> ENDDO
>>> ENDDO
>>> ENDDO
>> that is, exchange the loop order and add a directive that the
>> compiler does not change the order back. I would suggest to put the
>> directive into #ifdef TARGET_NEC_SX/#endif
>>
>> Also, at the end of the routine there are two barrier calls (each
>> call costs about 30% of routine runtime). Can these be moved into the
>> IF Block like this?
>>
>>> c _BARRIER
>>> IF ( doingSingleThreadedComms ) THEN
>>> _BARRIER
>>> C Restore saved settings that were stored to allow
>>> C single thred comms.
>>> _BEGIN_MASTER(myThid)
>>> DO I=1,nThreads
>>> myBxLo(I) = myBxLoSave(I)
>>> myBxHi(I) = myBxHiSave(I)
>>> myByLo(I) = myByLoSave(I)
>>> myByHi(I) = myByHiSave(I)
>>> ENDDO
>>> _END_MASTER(myThid)
>>> _BARRIER
>>> ENDIF
>>> c _BARRIER
>>
>> If you agree with these changes I will implement and test them. I am
>> asking, becaues I do not feel too comfortable with this part of the
>> code. Please let me know.
>>
>> (There is probably something similar in the corresponding exch2
>> routines, but I haven't tried that. yet.)
>>
>> Martin
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list