[MITgcm-devel] further vectorization
Martin Losch
Martin.Losch at awi.de
Wed Oct 31 03:59:08 EDT 2007
Hi all,
Jens-Olaf has identified (and fixed) another (small) bottleneck in
exch_rl_recv_get_x and exch_rl_send_put_x (and all the other files
that are created from the corresponding template):
The problem: the inner loop is always over i, but for the *_x
routines this loop is very short (basically Olx). Because the loop
boundaries are not available at compile time (iMin and iMax are set
earlier in the routine), only the inner loop is vectorized, resulting
in slow code (vectorization is at 20%): the routines are among the 20
most expensive ones.
This is his suggestion (2 instances, for east and west buffers):
> DO K=1,myNz
> !CDIR NOLOOPCHG
> DO I=iMin,iMax
> DO J=1,sNy
> iB = iB + 1
> array(I,J,K,bi,bj) = eastRecvBuf_RL(iB,eBl,bi,bj)
> ENDDO
> ENDDO
> ENDDO
that is, exchange the loop order and add a directive that the
compiler does not change the order back. I would suggest to put the
directive into #ifdef TARGET_NEC_SX/#endif
Also, at the end of the routine there are two barrier calls (each
call costs about 30% of routine runtime). Can these be moved into the
IF Block like this?
> c _BARRIER
> IF ( doingSingleThreadedComms ) THEN
> _BARRIER
> C Restore saved settings that were stored to allow
> C single thred comms.
> _BEGIN_MASTER(myThid)
> DO I=1,nThreads
> myBxLo(I) = myBxLoSave(I)
> myBxHi(I) = myBxHiSave(I)
> myByLo(I) = myByLoSave(I)
> myByHi(I) = myByHiSave(I)
> ENDDO
> _END_MASTER(myThid)
> _BARRIER
> ENDIF
> c _BARRIER
If you agree with these changes I will implement and test them. I am
asking, becaues I do not feel too comfortable with this part of the
code. Please let me know.
(There is probably something similar in the corresponding exch2
routines, but I haven't tried that. yet.)
Martin
More information about the MITgcm-devel
mailing list