[MITgcm-support] Re: MITgcm vectorization on NEC SX

Thu Jun 7 12:39:55 EDT 2007

Jens-Olaf (and everyone else),

  Thanks for your input, its very useful.

  If possible, it would be useful if people could overcome their shyness 
and post their trace information to mitgcm-support at mitgcm.org. That way 
we can rapidly see what is showing up in terms of modules and loops that 
are not vectorizing too well. In the past most discussions about 
vectorization (which the underlying algorithms and kernel numerics are, 
in general, very amenable to) on the German NEC installs have studiously 
left out the core MITgcm development team, which isn't always useful!

Chris
Jens-Olaf Beismann wrote:
> Dear all,
> 
> it's good to know that there's a quite some interest in using the MITgcm 
> on SX systems. I talked to Martin earlier this afternoon and explained 
> that unfortunately I'm too busy at the moment to really get involved in 
> optimising various MITgcm configurations. But I'll try to answer any 
> specific questions you might have if I find a spare moment. I'd suggest 
> that you collect all relevant information (SIZE.h, MPIPROGINF, ftrace 
> etc.) and make it accessible for me on the machine you're using.
> 
> Please note that I'll probably not be able to answer any request during 
> the next couple of weeks. You might also want to involve Armin Koehl (I 
> guess most of you know him) in the discussion; he has much experience 
> both with the forward and the adjoint code on SX-6.
> 
> Cheers,
> 
> Jens-Olaf
> 
>> You may have hit upon the problem with my configuration. I am indeed 
>> using pkg/dic where the looping is all
>> over the slowest k index.
>>
>> How did you rewrite your gchem code? Can you share some of it with me? 
>> Perhaps I can take a hand at rewriting
>> pkg/dic?
>>
>> Also, I am not using any tiles in the x-direction. But obviously the 
>> biogeochemistry will kill any benefit of doing so.
>>
>> I do find that adding the -C hopt optimization flag (not in any of the 
>> existing build options files) helps quite a bit.
>>
>> Samar
>>
>> On Jun 7, 2007, at 5:41 PM, Martin Losch wrote:
>>
>>> Hi Patrick et al.
>>>
>>> I am sorry to have expressed my current experience with our NEC SX8 
>>> the way I did. For fairly large (what is "fairly large": 300x300 
>>> horziontal points is good, 45x45 is not, 180x108 is still ok but not 
>>> great) problem I find the performance to be good. For 1CPU and 
>>> 300x300x100 points and the latest code I get 5540MFLOPS which is 
>>> approx 15% of the theoretical peak performance. This does not involve 
>>> exf nor any seaice pkg, nor does it involve any fancy tweaking.
>>>
>>> I have not yet found the optimal set of compiler flags, if anyone 
>>> knows anything better than what I have put into 
>>> SUPER-UX_SX-8_sxf90+mpi_awi I'd love to get (and try) them.
>>>
>>> Samar, for your domain, you probably do not need any domain 
>>> decomposition (makes the inner loops too small). I am using 
>>> bio-geochemical code, in which the loops are "the wrong way around", 
>>> that is the k-loop is the inner most loop, as commonly the case with 
>>> these models, e.g., the DIC pkg will have this problem and will slow 
>>> down the code dramatically. In my case, the gchem-related code (not 
>>> part of the cvs-repository) which takes about <10% on a parallel 
>>> maching with amd64-cpus, takes 80% of the total time on the NEC. I 
>>> have tried to in line a few routines within the k-loop and that 
>>> enables vectorization of the (short=23) k-loop, which reduces the cpu 
>>> time spent in that routine by a factor of 2 already. But in the end I 
>>> will have to rewrite this part of the code. But the other parts (even 
>>> seaice and exf) do not seem to be a terrible problem.
>>>
>>> Martin
>>>
>>> On 7 Jun 2007, at 16:41, Patrick Heimbach wrote:
>>>
>>>>
>>>> Hi Jens-Olaf, Samar, Martin,
>>>>
>>>> Jens-Olaf, thanks for pointing out that there are
>>>> MITgcm setups which are running efficiently on some SX platforms
>>>> (there have been rumors out there that it couldn't).
>>>> It would be great if you could share your experience as to
>>>> what it took to achieve this efficiency, or known bottlenecks,
>>>> maybe in working together with Martin at AWI and Samar at Kiel.
>>>>
>>>> We think that in theory, there is little that should prevent
>>>> MITgcm to vectorize efficiently, if the domain decomposition is chosen
>>>> accordingly (long inner loops).
>>>> My understanding is that compiler optimization will also seek to
>>>> collapse inner loops if possible (no index dependencies)
>>>> to extend inner loop lengths.
>>>> Again, this should work for many (but not all) subroutines.
>>>>
>>>> Some problematic code for vectorization that I am aware of
>>>> is the seaice package as well as the bulk formula code in exf.
>>>> But these ones Samar isn't using.
>>>>
>>>> If needed we can move this conversation to the devel list.
>>>>
>>>> Cheers
>>>> -Patrick
>>>>
>>>>
>>>>
>>>> On Jun 7, 2007, at 10:21 AM, Samar Khatiwala wrote:
>>>>
>>>>> Hi Jens-Olaf
>>>>>
>>>>> This is a bit off topic, but just to follow up on your post:
>>>>>
>>>>> I am currently running the MITgcm on an SX-8 at uni-kiel. This is a 
>>>>> coarse resolution configuration (128 x 64 x 15).
>>>>> Unfortunately, performance has not be so great and, I am told by 
>>>>> other users of the SX, is significantly below its
>>>>> theoretical peak. Even less than the 25-35% number you quote.
>>>>>
>>>>> Perhaps you can advise, off-list, on how I can improve things.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Samar
>>>>>
>>>>> On Jun 7, 2007, at 2:28 PM, Jens-Olaf Beismann wrote:
>>>>>
>>>>>> Dear MITgcm users,
>>>>>>
>>>>>>> This "philosphy" works fairly well for most single cpu and 
>>>>>>> parallel computer architectures, although I am now struggling 
>>>>>>> with a vector computer for which the MITgcm is only efficient if 
>>>>>>> the horizontal domain size is fairly large (because the code 
>>>>>>> generally excludes vectorization in the vertical dimension, and 
>>>>>>> that's not likely to change).
>>>>>>
>>>>>> just a quick comment regarding the use of the MITgcm on vector 
>>>>>> machines: I'm not familiar with Martin's application, but I know 
>>>>>> several MITgcm configurations which are used very efficiently on 
>>>>>> SX machines at other computing centres. These are "medium-sized" 
>>>>>> regional ocean models, and they typically run at appr. 25-35% of 
>>>>>> the theoretical peak performance.
>>>>>>
>>>>>> As Martin pointed out, it is necessary to have a completely 
>>>>>> vectorised code to achieve good vector performance, but there is 
>>>>>> no general problem in running the MITgcm on a vector machine.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Jens-Olaf
>>>>>> -- 
>>>>>> Dr. Jens-Olaf Beismann           Benchmarking Analyst
>>>>>> NEC High Performance Computing Europe GmbH
>>>>>> Prinzenallee 11, D-40549 Duesseldorf, Germany
>>>>>> Tel: +49 4326 288859 (office)  +49 160 183 5289 (mobile)
>>>>>> Fax: +49 4326 288861              http://www.hpce.nec.com
>>>>>>
>>>>>> _______________________________________________
>>>>>> MITgcm-support mailing list
>>>>>> MITgcm-support at mitgcm.org
>>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>>>>
>>>>> _______________________________________________
>>>>> MITgcm-support mailing list
>>>>> MITgcm-support at mitgcm.org
>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>>>
>>>> ---
>>>> Dr Patrick Heimbach | heimbach at mit.edu | http://www.mit.edu/~heimbach
>>>> MIT | EAPS, 54-1518 | 77 Massachusetts Ave | Cambridge, MA 02139, USA
>>>> FON: +1-617-253-5259 | FAX: +1-617-253-4464 | SKYPE: patrick.heimbach
>>>>
>>>>
>>
>>
>>
> 
>