[MITgcm-support] Fwd: MPI rules to save compute time?
Matthew Mazloff
mmazloff at ucsd.edu
Mon Nov 14 11:44:49 EST 2011
Hello
Yes -- square is good -- so you could try 20x21x8 -- but then you have
much overlap...
and also if you are doing IO while running you have to consider how it
is done.
4 processors may mean more writes -- and maybe you are seeing latency
issues with your filesystem.
You can try with
singlecpuio = true in your data file -- which helps on some machines,
though hurts on others
(or if you have it set to true -- you could try with it set to false..)
You probably aren't pushing memory limits per processor with NR only =
to 8 -- so you could also try multi-threaded to see how that effects
performance....
-Matt
On Nov 14, 2011, at 8:25 AM, Holly Dail wrote:
> Hi Chunyan -
>
> A couple of ideas ...
>
> - you can expect the best performance for tiles that are more square
> than rectangular -- that minimizes communication of boundary regions
> vs. computation on interior points of the tile. So you'd expect
> tiles of shape 40x42 to perform somewhat better than 80x21.
>
> - due to the cost of communicating boundary regions between
> processes, there is a limit beyond which it doesn't make much sense
> to add more processors. Your domain is pretty small, and I wouldn't
> be surprised if you see greatly diminished returns with tiles
> smaller than about 40x42x8 (that explains why you only saved 17% of
> the runtime when you went from 2 to 4 processors).
>
> I hadn't been able to explain why run1 did not perform somewhat
> better than run0, but just now I've seen Dmitris email so perhaps
> that completes the picture.
>
> Holly
>
> On Nov 14, 2011, at Nov 14 , 10:47 AM, Chun-Yan Zhou wrote:
>
>> Dear all,
>> I use verification/exp4 to test the MPI.
>> The the Nx=80, Ny=42 and Nr=8.
>> I use differnt nPx and Npy to run the same iteration number and the
>> data files are same.
>> &PARM04
>> usingCartesianGrid=.TRUE.,
>> delX=80*5.e3,
>> delY=42*5.e3,
>> delR= 8*562.5,
>> &
>> sNx sNy nPx nPy processes time cost
>> run0 80 42 1 1 1 20mins
>> run1 80 21 1 2 2 20mins
>> run2 40 42 2 1 2 12mins
>> run3 40 21 2 2 4 10mins
>>
>> comparing run0 and run1, nPy=2 didn't speed up the computation.
>> But nPx=2 case is much faster than NPx=1(run0 VS. run2). Since the
>> grid spaces are the same in the X and Y direction, what causes the
>> compute time difference? Any idea about this? Any rules to assign
>> the nPx and nPy values in order to save time?
>>
>> Thanks in advance!
>> chunyan
>>
>>
>> The University of Dundee is a registered Scottish charity, No:
>> SC015096
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
More information about the MITgcm-support
mailing list