[MITgcm-devel] vectorizing adseaice_solve4temp
Patrick Heimbach
heimbach at MIT.EDU
Thu Nov 25 07:34:56 EST 2010
Hi Martin,
as far as I remember my only reason recently for not storing
variables inside iteration loops was that the recomps are
usually benign as you say, and I didn't want to increase memory footprint
further (this starts to hit us on machines like pleiades).
The array size is sNx*sNy*(max. no. of iters).
We've actually applied similar method before, e.g. for EXF,
see (in the_main_loop):
CADJ INIT comlev1_exf_2
CADJ & = COMMON,niter_bulk*nchklev_1*snx*nsx*sny*nsy*nthreads_chkpt
I'd say, since it has such a dramatic effect for you,
just add it without the TARGET_SX flag to keep things simple.
But perhaps, instead of
CADJ INIT comlev1_solve4temp = COMMON, sNx*sNy*10
perhaps put
CADJ INIT comlev1_solve4temp = COMMON, sNx*sNy*NMAX_TICE
and declare NMAX_TICE as parameter (slightly more transparent?)
Cheers
-p.
On Nov 25, 2010, at 5:56 AM, Martin Losch wrote:
> Hi Patrick,
>
> the adjoint code of seaice_solve4temp handles a RECOMPUTATION by creating a local array tsurfloch. Within the "adjoint" interation tsurfloch is copied back to tsurfloc. Unfortunately, the full array is copied back for each (i,j). For most people this is probably benign, but for the SX8 this destroys the performance, because only the copying loop is vectorized. I have found a way (with Ralf's help) to overcome that by defining a local tape at the beginning of solve4temp (with a hack to avoid another global maximum value for now):
>> ifdef ALLOW_AUTODIFF_TAMC
>> IF (IMAX_TICE .GT. 10) THEN
>> STOP 'S/R SEAICE_SOLVE4TEMP: IMAX_TICE > 10'
>> ENDIF
>> CADJ INIT comlev1_solve4temp = COMMON, sNx*sNy*10
>> #endif /* ALLOW_AUTODIFF_TAMC */
> and in the iteration:
>> Ccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
>> DO ITER=1,IMAX_TICE
>> DO J=1,sNy
>> DO I=1,sNx
>> #ifdef ALLOW_AUTODIFF_TAMC
>> iicekey = I + sNx*(J-1) + (ITER-1)*sNx*sNy
>> CADJ STORE tsurfloc(i,j) = comlev1_solve4temp,
>> CADJ & key = iicekey, byte = isbyte
>> #endif /* ALLOW_AUTODIFF_TAMC */
>>
>> IF ( iceOrNot(I,J) ) THEN
> This changes the something like
>> tsurfloc(i,j) = max(273.16d0+min_tice,tsurfloc(i,j))
>> adtsurfloc(i,j) = adtsurfloc(i,j)*(0.5+sign(0.5d0,tmelt-
>> $tsurfloc(i,j)))
>> do ip2 = 1, sny
>> do ip1 = 1, snx
>> tsurfloc(ip1,ip2) = tsurfloch(ip1,ip2)
>> end do
>> end do
> to
>> tsurfloc(i,j) = max(273.16d0+min_tice,tsurfloc(i,j))
>> adtsurfloc(i,j) = adtsurfloc(i,j)*(0.5+sign(0.5d0,tmelt-
>> $tsurfloc(i,j)))
>> tsurfloc(i,j) = comlev1_solve4temp_tsurfloc_1h(iicekey)
>
> and the routine is (because of the more efficient vectorization) dramatically faster. Should I include this within TARGET_SX CPP-flags, or is this OK for anyone? I'm asking, because there was a reason for not storing tsurfloc earlier, right?
>
> Martin
>
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
---
Patrick Heimbach | heimbach at mit.edu | http://www.mit.edu/~heimbach
MIT | EAPS 54-1518 | 77 Massachusetts Ave | Cambridge MA 02139 USA
FON +1-617-253-5259 | FAX +1-617-253-4464 | SKYPE patrick.heimbach
More information about the MITgcm-devel
mailing list