[MITgcm-support] inefficient pressure solver
Martin Losch
Martin.Losch at awi.de
Tue Jul 14 10:16:09 EDT 2009
The memory bandwith problem appears as soon as you use more than 1 or
2 cores per quad-core unit, so what David is seeing here is probably
something different, because it looks like he is running with fully
loaded nodes, right?
cg2d does 2 2D exchanges and 2 global sums per iteration. I suspect
that one of these (or both) operations are very expensive on your
system. Can you do a flow trace analysis that lets you see, where the
time is actually spent? If I am right, it's not spent in the routine
cg2d itself, but in the MPI routines (mpi_send/recv/allreduce,
whatever flags you are using, you can change the behavior a little by
defining the appropriate flags in CPP_EEOPTIONS.h).
Martin
On Jul 14, 2009, at 3:53 PM, David Hebert wrote:
> David,
>
> I recall discussion earlier in the year about difficulties with quad
> core processors and memory bandwidth. Could this be what you are
> seeing as you increase cores?
>
> David
>
> David Wang wrote:
>> Hi MITgcmers,
>>
>> We have experienced problems with MITgcm on a small local cluster
>> (24-node dual AMD Opteron quad-core processors "shanghai" with
>> Infiniband using OpenMPI 1.3.2). The symptom is that when we
>> increase the number of processors (nProcs), the pressure solver
>> cg2d takes a progressively larger share (SOLVE_FOR_PRESSURE in
>> STDOUT.0000) of the total walltime (ALL in STDOUT.0000), and this
>> percentage is much larger than on other clusters (specifically
>> TACC's Ranger and Lonestar).
>>
>> Some 1-year hydrostatic, implicit free-surface test runs with the
>> grid points of 360x224x46, asynchronous timestepping (1200.s/
>> 43200.s) result in the following statistics:
>>
>> nodes cores ALL (sec) SOLVE_FOR_PRESSURE (sec)
>> SOLVE_FOR_PRESSURE/ALL (%)
>> 1 8 1873 93 4.97%
>> 2 16 922 129 13.99%
>> 4 32 682 310 45.45%
>>
>> And with 96 cores, this percentage soars to about 80%!
>>
>> However, our experience with TACC's Ranger and Lonestar shows that
>> this percentage does increase with the number of processors, but
>> never above 40%. TACC's machines use mvapich. So we also tested
>> mvapich on our local cluster but found no better luck.
>>
>> We have no idea why the cg2d pressure solver runs so inefficiently
>> on our cluster. If anyone can kindly provide a few clues, we will
>> very much appreciate them.
>>
>> Thanks,
>> David
>>
>> --
>> turn and live.
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
More information about the MITgcm-support
mailing list