[MITgcm-devel] further vectorization

Thu Nov 1 12:33:41 EDT 2007

Another point/question:
can I make exchCollectStatistics a runtime parameter? currently, it's  
set to true in exch_init.F, but if I can turn it off in eedata, that  
would speed up the code too.

Martin

On 1 Nov 2007, at 17:07, Martin Losch wrote:

> That's OK for me too, but why not move the barrier into the if- 
> block? do we need to wait for every thread before executing "if  
> (dosinglethreadedcomms)"?
>
> M.
> On 31 Oct 2007, at 16:32, Jean-Michel Campin wrote:
>
>> Hi Martin,
>>
>> Chris is looking at your suggestions.
>> Regarding the BARRIER thing, if was wandering if something
>> like:
>>       IF ( nSx.NE.1 .OR. nSy.NE.1 ) THEN
>>        _BARRIER
>>       ENDIF
>> would do it. The compiler should know (since nSx & nSy are
>> parameters) that he can remove those barrier when it's safe
>> (both nSx & nSy = 1).
>> And when nSx > 1 or nSy > 1 , we can still use the same
>> executable for single-thread or multi-threads run
>> just by changing eedata.
>>
>> Jean-Michel
>>
>> On Wed, Oct 31, 2007 at 08:59:08AM +0100, Martin Losch wrote:
>>> Hi all,
>>>
>>> Jens-Olaf has identified (and fixed) another (small) bottleneck in
>>> exch_rl_recv_get_x and exch_rl_send_put_x (and all the other files
>>> that are created from the corresponding template):
>>>
>>> The problem: the inner loop is always over i, but for the *_x
>>> routines this loop is very short (basically Olx). Because the loop
>>> boundaries are not available at compile time (iMin and iMax are set
>>> earlier in the routine), only the inner loop is vectorized,  
>>> resulting
>>> in slow code (vectorization is at 20%): the routines are among  
>>> the 20
>>> most expensive ones.
>>> This is his suggestion (2 instances, for east and west buffers):
>>>>          DO K=1,myNz
>>>> !CDIR NOLOOPCHG
>>>>           DO I=iMin,iMax
>>>>            DO J=1,sNy
>>>>             iB = iB + 1
>>>>             array(I,J,K,bi,bj) = eastRecvBuf_RL(iB,eBl,bi,bj)
>>>>            ENDDO
>>>>           ENDDO
>>>>          ENDDO
>>> that is, exchange the loop order and add a directive that the
>>> compiler does not change the order back. I would suggest to put the
>>> directive into #ifdef TARGET_NEC_SX/#endif
>>>
>>> Also, at the end of the routine there are two barrier calls (each
>>> call costs about 30% of routine runtime). Can these be moved into  
>>> the
>>> IF Block like this?
>>>
>>>> c     _BARRIER
>>>>      IF ( doingSingleThreadedComms ) THEN
>>>>     _BARRIER
>>>> C      Restore saved settings that were stored to allow
>>>> C      single thred comms.
>>>>       _BEGIN_MASTER(myThid)
>>>>        DO I=1,nThreads
>>>>         myBxLo(I) = myBxLoSave(I)
>>>>         myBxHi(I) = myBxHiSave(I)
>>>>         myByLo(I) = myByLoSave(I)
>>>>         myByHi(I) = myByHiSave(I)
>>>>        ENDDO
>>>>       _END_MASTER(myThid)
>>>>     _BARRIER
>>>>      ENDIF
>>>> c     _BARRIER
>>>
>>> If you agree with these changes I will implement and test them. I am
>>> asking, becaues I do not feel too comfortable with this part of the
>>> code. Please let me know.
>>>
>>> (There is probably something similar in the corresponding exch2
>>> routines, but I haven't tried that. yet.)
>>>
>>> Martin
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel