[MITgcm-devel] further vectorization

Wed Oct 31 03:59:08 EDT 2007

Hi all,

Jens-Olaf has identified (and fixed) another (small) bottleneck in  
exch_rl_recv_get_x and exch_rl_send_put_x (and all the other files  
that are created from the corresponding template):

The problem: the inner loop is always over i, but for the *_x  
routines this loop is very short (basically Olx). Because the loop  
boundaries are not available at compile time (iMin and iMax are set  
earlier in the routine), only the inner loop is vectorized, resulting  
in slow code (vectorization is at 20%): the routines are among the 20  
most expensive ones.
This is his suggestion (2 instances, for east and west buffers):
>           DO K=1,myNz
> !CDIR NOLOOPCHG
>            DO I=iMin,iMax
>             DO J=1,sNy
>              iB = iB + 1
>              array(I,J,K,bi,bj) = eastRecvBuf_RL(iB,eBl,bi,bj)
>             ENDDO
>            ENDDO
>           ENDDO
that is, exchange the loop order and add a directive that the  
compiler does not change the order back. I would suggest to put the  
directive into #ifdef TARGET_NEC_SX/#endif

Also, at the end of the routine there are two barrier calls (each  
call costs about 30% of routine runtime). Can these be moved into the  
IF Block like this?

> c     _BARRIER
>       IF ( doingSingleThreadedComms ) THEN
>      _BARRIER
> C      Restore saved settings that were stored to allow
> C      single thred comms.
>        _BEGIN_MASTER(myThid)
>         DO I=1,nThreads
>          myBxLo(I) = myBxLoSave(I)
>          myBxHi(I) = myBxHiSave(I)
>          myByLo(I) = myByLoSave(I)
>          myByHi(I) = myByHiSave(I)
>         ENDDO
>        _END_MASTER(myThid)
>      _BARRIER
>       ENDIF
> c     _BARRIER

If you agree with these changes I will implement and test them. I am  
asking, becaues I do not feel too comfortable with this part of the  
code. Please let me know.

(There is probably something similar in the corresponding exch2  
routines, but I haven't tried that. yet.)

Martin