[MITgcm-devel] Re: thsice is reeeeeeeeeally scalar!

Mon Oct 1 09:25:06 EDT 2007

Hi Martin,

I already made few changes in thSIce pkg to push the i,j
loops inside some S/R, but since I did not re-write those
S/R, they might still not vectorize well.
And could probably change this thsice_reshape_layers easily.
I don't time now, but I will come back to this later on.
As far as thsice_get_exf is concerned, you probably know better 
than me.

"barrier" calls: I don't quiet understand the issue: it's
alway placed outside i,j,bi,bj loops, so inlining should not be 
an issue. And few might not be needed, but if you are 
running single thread, they are pretty much harmless.

thsice_advection: you wrote that this one vectorize well,
it found this unexpected, but good news anyway.
And it's different from seaice_advection. We talked about 
this before and can talk on the phone for details, but seaice
is more like test-2 (in http://mitgcm.org/~jmc/old/sea_ice_mar07.pdf)
whereas thsice is more like test-3.

Cheers,
Jean-Michel

On Mon, Oct 01, 2007 at 01:51:50PM +0200, Martin Losch wrote:
> Hi Jens-Olaf,
> 
> thanks for your input, a few questions/answers below:
> On 1 Oct 2007, at 12:48, Jens-Olaf Beismann wrote:
> 
> >Martin,
> >
> >I just had a very brief look at your ftraces:
> >
> >- on how may processors did you run these tests?
> 1CPU only, does that matter?
> >- in both tests the total number of procedure calls is very high
> >- in the THSICE case, thsice_get_exf and thsice_reshape_layers  
> >together give appr. 25e6 calls
> >- can these be inlined, and might inlining improve the  
> >vectorisation of the thsice routines you mentioned? Maybe  
> >vectorising THSICE isn't that big a task after all.
> You are right, the many thsice-subroutines are called from within i,j- 
> loops and are subject to inlining, but as you an I have found out,  
> inlining is not trivial for the sxf90+MITgcm because breaks the  
> genmake2 script. It will be necessary to restrict this optimization  
> option to specific configurations. Alternatively one could inline the  
> subroutines manually or do loop-pushing, but that is probably the  
> huge task I was talking about.
> >- inlining should also be applied to other routines, cf. the ones I  
> >listed in the cubed sphere case
> Same problem as above. I actually do inline a few subroutines. Some  
> routines however, eg. fool_the_compiler, are meant to break the  
> optimization and should NOT be inlined. I am not too familiar with  
> the code bits where this is important (multithreading!).
> >- you might want to try to get rid of some "barrier" calls as well.
> That's for Jean-Michel to decide which a superfluous
> >- regarding the advection routines, it would be helpful to compare  
> >the corresponding compiler listings
> True, but I was hoping that Jean-Michel would be able to tell me  
> right away, what's different between these routines, they should be  
> very similar.
> 
> Martin
> >
> >Cheers,
> >
> >Jens-Olaf
> >
> >>in my crusade to turn the MITgcm into a true vector code I noticed  
> >>that the thsice package would require a lot of work. I have  
> >>attached (in a gzipped tar-ball), the output  of a comparison  
> >>between runs with seaice+thsice and seaice only. The domain is  
> >>243x170x33 (Rüdiger Gerdes' Arctic Ocean configuration from  
> >>AOMIP), and I integrate for 10 days with deltaT=900sec, so 960  
> >>timesteps.
> >>If you have a look at ftrace.txt_thsice and ftrace.txt_seaice  
> >>(from flow trace analyses) you'll notice a few things:
> >>1. mom_calc_visc is by far the most expensive routine, probably  
> >>because I use the Leith scheme; I use a slightly lower  
> >>optimization -Cvopt, instead of -Chopt for this routine, but I  
> >>find this still surprising. I would have expected cg2d to be the  
> >>top runner.
> >>2. all routines that start with thsice_* have zero vector  
> >>operation ratio, and from the MFLOPS you can see that they are  
> >>really slow because of that.
> >>3. exception seaice_advection (V. OP. Ratio = 83%) vectorises  
> >>worse than thsice_advection (99.53%). I have no idea why.
> >>4. everything else looks decent except for the exch_rl_send/recv  
> >>routines. I am not touching them without detailed instructions.
> >>As a consequence the seaice+thsice is slower (692sec vs. 558sec,  
> >>stdout.*). The excess time is spend in THSICE_MAIN (146.91sec, as  
> >>opposed to seaice_growth+seaice_advdiff = 31.48-13.21=18.27sec).
> >>I don't want to undertake the huge task of vectorizing thsice, but  
> >>why is seaice_advection so different from thsice_advection (Jean- 
> >>Michel?).
> >>Martin
> >>CC to Jens-Olaf, although he cannot reply to this list, I guess  
> >>(just MITgcm-support at mitgcm.org).
> >
> >
> >-- 
> >Dr. Jens-Olaf Beismann           Benchmarking Analyst
> >NEC High Performance Computing Europe GmbH
> >Prinzenallee 11, D-40549 Duesseldorf, Germany
> >Tel: +49 4326 288859 (office)  +49 160 183 5289 (mobile)
> >Fax: +49 4326 288861              http://www.hpce.nec.com
> >
> 
> 
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel