[MITgcm-devel] vectorization of layers

Martin Losch Martin.Losch at awi.de
Mon Jan 7 05:58:28 EST 2013


Me again,

I am attaching an only slightly changed layers_fluxcalc.F, where I just split the ij-loops so that now the problematic part of the code is "isolated" and the rest of the routine can be vectorized (if you want to we can put #ifdef's around theses "splits"). I also changed the loop bounds for some initialization. This already improves the performance by a factor of about 2.6:
FREQUENCY  EXCLUSIVE       AVER.TIME    MOPS  MFLOPS V.OP  AVER.   VECTOR I-CACHE O-CACHE    BANK  PROG.UNIT
           TIME[sec](  % )    [msec]                 RATIO V.LEN    TIME   MISS    MISS      CONF
100   269.750( 86.1)  2697.502   630.0   101.7  5.98 181.0    0.753  0.0082 119.6809  6.7933  layers_fluxcalc (old)
100   107.462( 75.4)  1074.619  1229.6   217.7 61.83 181.0    7.717  0.0052 25.9203 44.6656  layers_fluxcalc (new)

the "exclusive time" has gone down from 270 to 107sec (still 75% of the entire runtime), vector operation ratio in increase from 6% to 62% (i.e. only 38% of the routine cannot be vectorized). I think that at least this change should be checked-in, but I would like you to test, if the changes have any results of the layers package.
What remains to be done is to fix the remaining loops (lines 144-180 and 135-170). The print statements (PRINT_ERROR) also needs to be taken out of these loops, but that's the smallest problem.
Here, it's important to understand what's going on in the if-block
after initialisation: kgu=Nlayers, so we don't need the first if (TatU.ge.layers_bounds(Nlayers,iLa)), right?
elseif (TatU.lt.layers_bounds(2,iLa)) is required
elseif (TatU.ge. layers_bounds(kgu,iLa).and.TatU.lt.layers_bounds(kgu+1,iLa)) is not required (because nothing happens)
The above can probably be put into a separate i,j-loop. Then one needs a new kkk=1,NZZ loop, and within this loop one increments kgu if "TatU.ge.layers_bounds(kgu+1,iLa), but not conditions 1+2+3", and "decrements for "TatU.lt.layers_bounds(kgu,ila) but no 1+2+3". 
Will that work algorithmically?

Martin

On Jan 7, 2013, at 9:36 AM, Martin Losch <Martin.Losch at awi.de> wrote:

> Hi Ryan,
> 
> I didn't want to make it sound as if the code is inefficient. The code is probably efficient for the type of current cpu that most people use, but for vector cpus it is absolutely important that the code "vectorizes"; the "scalar" performance of these computers is very poor. I am attaching an example to illustrate how very small changes can speep up the code dramatically (especially in the calc_r_star example).
> 
> For vectorization to work, the operations within a loop need to be independent of each other. Often I don't understand why something vectorizes or not, but there are analysis tools that help with that. In the present case the while loops are not vectorizable, because they clearly depend on the previous iterate (the while conditions is a function kgu, that is begin computed within the loop). Usually, what I do in cases like these is to move the problematic loop outside the i,j loops. In this case it is a little more difficult, because the while iteration will be different at each (i,j) point. 
> This is how I think this need to be changed in order to vectorize:
> - turn TatU/V into 2D arrays and precompute them for each kk (small change)
> - split the if-block starting at l143 ("IF (TatU .GE. layers_bounds(Nlayers,iLa)) THEN"). The first 2 (3, but in the third case nothing happens) case are simple and can remain in the i,j-loop. The 4th and 5th case (the ones with the while-loops) could be replace by a second kk-loop and appropriate if statements, like this;
> DO kk
> DO i,j
> IF ( appropriate conditions) THEN kgu(i,j) = something
> otherwise do nothing
> ENDDO
> ENDDO
> But this is where I am not sure if it works at all. Maybe it would be good ot set up or use a test-experiment (in the spirit of the verification experiments) that checks the results of the layers package, so that I know when I break the algorithm. 
> Do you have something simple like that? That would already help greatly. I would then try to come up with a solution and you can tell me if you like it. Most likely, this solution will involve a "ifdef VECTORIZE then do the new code, #else do the old code".
> 
> Martin
> 
> On Jan 4, 2013, at 6:51 PM, Ryan Abernathey <ryan.abernathey at gmail.com> wrote:
> 
>> I wrote this code. What is going on here is the most important and expensive part of the LAYERS package: the sorting of each point in the water column into the appropriate density class. This is essentially a search problem (it has to hunt for the correct bin), and I based the code on one of the search algorithms from Numerical Recipes. I did some basic testing and optimization of the code in MATLAB before putting it into the model, but I am sure there is lots of room for improvement.
>> 
>> The "hunting" takes place inside the while loops. kgu is an index for layers_bounds that tracks which target density bin is being tested. If the bin is too cold for the current point (whose density value is TatU), kgu is augmented. The reason for using a while-loop is that it only runs as long as it has to until the search condition is met. This seemed like the fastest thing at the time.
>> 
>> This search has to run on each point in the refined vertical grid (Nr * FineGridFact), so any improvement to its execution time would really multiply.  
>> 
>> I have only a vague understanding of what vectorization is, but I would be happy to work with someone who understands such things better in order to make the code more efficient. 
>> 
>> -Ryan
>> 
>> 
>> On Fri, Jan 4, 2013 at 10:58 AM, Martin Losch <Martin.Losch at awi.de> wrote:
>> Hi there,
>> I am not sure how much effort one still should invest into vectorizing the code, since our vector machine will only run for another year or so, but can you think of a good way of rewrite layers_fluxcalc.F so that it vectorizes? Currently these lines (and the corresponding for TatV):
>> 
>>         ELSE IF (TatU .GE. layers_bounds(kgu(i,j),iLa)) THEN
>> C        have to hunt for the right bin by getting hotter
>>          DO WHILE (TatU .GE. layers_bounds(kgu(i,j)+1,iLa))
>>           kgu(i,j) = kgu(i,j) + 1
>>          ENDDO
>> C         now layers_bounds(kgu(i,j)+1,iLa) < TatU <= layers_bounds(kgu(i,j)+1,iLa)
>>         ELSE IF (TatU .LT. layers_bounds(kgu(i,j)+1,iLa)) THEN
>> C        have to hunt for the right bin by getting colder
>>          DO WHILE (TatU .LT. layers_bounds(kgu(i,j),iLa))
>>           kgu(i,j) = kgu(i,j) - 1
>>          ENDDO
>> C         now layers_bounds(kgu(i,j)+1,iLa) <= TatU < layers_bounds(kgu(i,j)+1,iLa)
>>         ELSE
>> 
>> inhibit vectorization (because the while-loops don't vectorize), and thus making it impossible to use the layers-pkg on a vector computer (This routines has a vector operation ratio of 5% and takes 86% of the computer time). Currently I have no clue what to do here, but I don't quite understand, what's going on in thie package/routine.
>> 
>> Martin
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>> 
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
> 
> 
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: layers_fluxcalc.F
Type: application/octet-stream
Size: 11690 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-devel/attachments/20130107/3fc5ef1b/attachment-0001.obj>


More information about the MITgcm-devel mailing list