[MITgcm-support] inefficient pressure solver

Tue Jul 14 10:16:09 EDT 2009

The memory bandwith problem appears as soon as you use more than 1 or  
2 cores per quad-core unit, so what David is seeing here is probably  
something different, because it looks like he is running with fully  
loaded nodes, right?

cg2d does 2 2D exchanges and 2 global sums per iteration. I suspect  
that one of these (or both) operations are very expensive on your  
system. Can you do a flow trace analysis that lets you see, where the  
time is actually spent? If I am right, it's not spent in the routine  
cg2d itself, but in the MPI routines (mpi_send/recv/allreduce,  
whatever flags you are using, you can change the behavior a little by  
defining the appropriate flags in CPP_EEOPTIONS.h).

Martin

On Jul 14, 2009, at 3:53 PM, David Hebert wrote:

> David,
>
> I recall discussion earlier in the year about difficulties with quad  
> core processors and memory bandwidth. Could this be what you are  
> seeing as you increase cores?
>
> David
>
> David Wang wrote:
>> Hi MITgcmers,
>>
>> We have experienced problems with MITgcm on a small local cluster  
>> (24-node dual AMD Opteron quad-core processors "shanghai" with  
>> Infiniband using OpenMPI 1.3.2). The symptom is that when we  
>> increase the number of processors (nProcs), the pressure solver  
>> cg2d takes a progressively larger share (SOLVE_FOR_PRESSURE in  
>> STDOUT.0000) of the total walltime (ALL in STDOUT.0000), and this  
>> percentage is much larger than on other clusters (specifically  
>> TACC's Ranger and Lonestar).
>>
>> Some 1-year hydrostatic, implicit free-surface test runs with the  
>> grid points of 360x224x46, asynchronous timestepping (1200.s/ 
>> 43200.s) result in the following statistics:
>>
>> nodes    cores    ALL (sec)    SOLVE_FOR_PRESSURE (sec)     
>> SOLVE_FOR_PRESSURE/ALL (%)
>> 1    8    1873    93    4.97%
>> 2    16    922    129    13.99%
>> 4    32    682    310    45.45%
>>
>> And with 96 cores, this percentage soars to about 80%!
>>
>> However, our experience with TACC's Ranger and Lonestar shows that  
>> this percentage does increase with the number of processors, but  
>> never above 40%. TACC's machines use mvapich. So we also tested  
>> mvapich on our local cluster but found no better luck.
>>
>> We have no idea why the cg2d pressure solver runs so inefficiently  
>> on our cluster. If anyone can kindly provide a few clues, we will  
>> very much appreciate them.
>>
>> Thanks,
>> David
>>
>> -- 
>> turn and live.
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support