[MITgcm-support] Fwd: MPI rules to save compute time?

Mon Nov 14 11:44:49 EST 2011

Hello

Yes -- square is good -- so you could try 20x21x8 -- but then you have  
much overlap...

and also if you are doing IO while running you have to consider how it  
is done.

4 processors may mean more writes -- and maybe you are seeing latency  
issues with your filesystem.

You can try with
singlecpuio = true in your data file -- which helps on some machines,  
though hurts on others
(or if you have it set to true -- you could try with it set to false..)

You probably aren't pushing memory limits per processor with NR only =  
to 8 -- so you could also try multi-threaded to see how that effects  
performance....

-Matt

On Nov 14, 2011, at 8:25 AM, Holly Dail wrote:

> Hi Chunyan -
>
> A couple of ideas ...
>
> - you can expect the best performance for tiles that are more square  
> than rectangular -- that minimizes communication of boundary regions  
> vs. computation on interior points of the tile.  So you'd expect  
> tiles of shape 40x42 to perform somewhat better than 80x21.
>
> - due to the cost of communicating boundary regions between  
> processes, there is a limit beyond which it doesn't make much sense  
> to add more processors.  Your domain is pretty small, and I wouldn't  
> be surprised if you see greatly diminished returns with tiles  
> smaller than about 40x42x8 (that explains why you only saved 17% of  
> the runtime when you went from 2 to 4 processors).
>
> I hadn't been able to explain why run1 did not perform somewhat  
> better than run0, but just now I've seen Dmitris email so perhaps  
> that completes the picture.
>
> Holly
>
> On Nov 14, 2011, at Nov 14 , 10:47 AM, Chun-Yan Zhou wrote:
>
>> Dear all,
>> I use verification/exp4 to test the MPI.
>> The the Nx=80, Ny=42 and Nr=8.
>> I use differnt nPx and Npy to run the same iteration number and the  
>> data files are same.
>> &PARM04
>> usingCartesianGrid=.TRUE.,
>> delX=80*5.e3,
>> delY=42*5.e3,
>> delR= 8*562.5,
>> &
>>          sNx   sNy  nPx nPy processes    time cost
>> run0    80      42   1       1          1             20mins
>> run1    80      21   1       2          2             20mins
>> run2     40     42   2       1          2             12mins
>> run3     40      21  2        2          4            10mins
>>
>> comparing run0 and run1,  nPy=2 didn't speed up the computation.  
>> But nPx=2 case is much faster than NPx=1(run0 VS. run2). Since the  
>> grid spaces are the same in the X and Y direction, what causes the  
>> compute time difference?  Any idea about this? Any rules to assign  
>> the nPx and nPy values in order to save time?
>>
>> Thanks in advance!
>> chunyan
>>
>>
>> The University of Dundee is a registered Scottish charity, No:  
>> SC015096
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support