[MITgcm-support] inefficient pressure solver

Tue Jul 14 10:50:12 EDT 2009

Hi Martin and David,

Thanks for the responses. I think Martin got me right. I use all 8 cores (2
CPUs, 4 cores in each) in a compute node.

It seems that, as Martin pointed out, MPI is not doing the right thing. I'm
no expert on this. I have been using the default (both on our cluster and
TACC's Ranger and Lonestar. Probably TACC has done some relevant MPI
optimizations by default?) which is #undef GLOBAL_SUM_SEND_RECV . But I find
in one of verification runs
(verification/global_ocean.90x40x15/code/CPP_EEOPTIONS.h):

#define GLOBAL_SUM_SEND_RECV

I will try this. Thanks a lot!
D.

On Tue, Jul 14, 2009 at 10:16 AM, Martin Losch <Martin.Losch at awi.de> wrote:

> The memory bandwith problem appears as soon as you use more than 1 or 2
> cores per quad-core unit, so what David is seeing here is probably something
> different, because it looks like he is running with fully loaded nodes,
> right?
>
> cg2d does 2 2D exchanges and 2 global sums per iteration. I suspect that
> one of these (or both) operations are very expensive on your system. Can you
> do a flow trace analysis that lets you see, where the time is actually
> spent? If I am right, it's not spent in the routine cg2d itself, but in the
> MPI routines (mpi_send/recv/allreduce, whatever flags you are using, you can
> change the behavior a little by defining the appropriate flags in
> CPP_EEOPTIONS.h).
>
> Martin
>
>
> On Jul 14, 2009, at 3:53 PM, David Hebert wrote:
>
>  David,
>>
>> I recall discussion earlier in the year about difficulties with quad core
>> processors and memory bandwidth. Could this be what you are seeing as you
>> increase cores?
>>
>> David
>>
>> David Wang wrote:
>>
>>> Hi MITgcmers,
>>>
>>> We have experienced problems with MITgcm on a small local cluster
>>> (24-node dual AMD Opteron quad-core processors "shanghai" with Infiniband
>>> using OpenMPI 1.3.2). The symptom is that when we increase the number of
>>> processors (nProcs), the pressure solver cg2d takes a progressively larger
>>> share (SOLVE_FOR_PRESSURE in STDOUT.0000) of the total walltime (ALL in
>>> STDOUT.0000), and this percentage is much larger than on other clusters
>>> (specifically TACC's Ranger and Lonestar).
>>>
>>> Some 1-year hydrostatic, implicit free-surface test runs with the grid
>>> points of 360x224x46, asynchronous timestepping (1200.s/43200.s) result in
>>> the following statistics:
>>>
>>> nodes    cores    ALL (sec)    SOLVE_FOR_PRESSURE (sec)
>>>  SOLVE_FOR_PRESSURE/ALL (%)
>>> 1    8    1873    93    4.97%
>>> 2    16    922    129    13.99%
>>> 4    32    682    310    45.45%
>>>
>>> And with 96 cores, this percentage soars to about 80%!
>>>
>>> However, our experience with TACC's Ranger and Lonestar shows that this
>>> percentage does increase with the number of processors, but never above 40%.
>>> TACC's machines use mvapich. So we also tested mvapich on our local cluster
>>> but found no better luck.
>>>
>>> We have no idea why the cg2d pressure solver runs so inefficiently on our
>>> cluster. If anyone can kindly provide a few clues, we will very much
>>> appreciate them.
>>>
>>> Thanks,
>>> David
>>>
>>> --
>>> turn and live.
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> MITgcm-support mailing list
>>> MITgcm-support at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>>
>>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>

-- 
turn and live.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20090714/c570dbac/attachment.htm>