[MITgcm-devel] further vectorization

Jean-Michel Campin jmc at ocean.mit.edu
Wed Oct 31 11:32:47 EDT 2007


Hi Martin,

Chris is looking at your suggestions.
Regarding the BARRIER thing, if was wandering if something
like:
      IF ( nSx.NE.1 .OR. nSy.NE.1 ) THEN
       _BARRIER
      ENDIF
would do it. The compiler should know (since nSx & nSy are 
parameters) that he can remove those barrier when it's safe
(both nSx & nSy = 1).
And when nSx > 1 or nSy > 1 , we can still use the same
executable for single-thread or multi-threads run
just by changing eedata.

Jean-Michel

On Wed, Oct 31, 2007 at 08:59:08AM +0100, Martin Losch wrote:
> Hi all,
> 
> Jens-Olaf has identified (and fixed) another (small) bottleneck in  
> exch_rl_recv_get_x and exch_rl_send_put_x (and all the other files  
> that are created from the corresponding template):
> 
> The problem: the inner loop is always over i, but for the *_x  
> routines this loop is very short (basically Olx). Because the loop  
> boundaries are not available at compile time (iMin and iMax are set  
> earlier in the routine), only the inner loop is vectorized, resulting  
> in slow code (vectorization is at 20%): the routines are among the 20  
> most expensive ones.
> This is his suggestion (2 instances, for east and west buffers):
> >          DO K=1,myNz
> >!CDIR NOLOOPCHG
> >           DO I=iMin,iMax
> >            DO J=1,sNy
> >             iB = iB + 1
> >             array(I,J,K,bi,bj) = eastRecvBuf_RL(iB,eBl,bi,bj)
> >            ENDDO
> >           ENDDO
> >          ENDDO
> that is, exchange the loop order and add a directive that the  
> compiler does not change the order back. I would suggest to put the  
> directive into #ifdef TARGET_NEC_SX/#endif
> 
> Also, at the end of the routine there are two barrier calls (each  
> call costs about 30% of routine runtime). Can these be moved into the  
> IF Block like this?
> 
> >c     _BARRIER
> >      IF ( doingSingleThreadedComms ) THEN
> >     _BARRIER
> >C      Restore saved settings that were stored to allow
> >C      single thred comms.
> >       _BEGIN_MASTER(myThid)
> >        DO I=1,nThreads
> >         myBxLo(I) = myBxLoSave(I)
> >         myBxHi(I) = myBxHiSave(I)
> >         myByLo(I) = myByLoSave(I)
> >         myByHi(I) = myByHiSave(I)
> >        ENDDO
> >       _END_MASTER(myThid)
> >     _BARRIER
> >      ENDIF
> >c     _BARRIER
> 
> If you agree with these changes I will implement and test them. I am  
> asking, becaues I do not feel too comfortable with this part of the  
> code. Please let me know.
> 
> (There is probably something similar in the corresponding exch2  
> routines, but I haven't tried that. yet.)
> 
> Martin
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel



More information about the MITgcm-devel mailing list