[MITgcm-support] changing number of processors

Menemenlis, Dimitris (329D) Dimitris.Menemenlis at jpl.nasa.gov
Thu Mar 5 06:57:08 EST 2015


My personal prejudice (and it may be wrong): if you want efficient I/O, you need to get rid of netcdf package; mdsio and extensions is a lot more flexible and efficient.  In any case there is no need to guess about cause of bottleneck; just look at timings at end of your STDOUT.0000 file.

On Mar 5, 2015, at 3:20 AM, Jonny Williams <Jonny.Williams at bristol.ac.uk<mailto:Jonny.Williams at bristol.ac.uk>> wrote:

As a related question to this thread, is it possible to output one NetCDF file per stream (state*.nc, ptracers*.nc, etc) rather than one per process?

I am currently running on ARCHER, the national supercomputing facility and I am not getting the speed up that I am expecting for a long job whereas I did get the expected speed for a very short test job.

I am thinking that the I/O may be a bottleneck here perhaps?

Cheers!

Jonny

On 10 February 2015 at 07:38, Martin Losch <Martin.Losch at awi.de<mailto:Martin.Losch at awi.de>> wrote:
Hi Jonny and others,

I am not sure if I understand your question about "the utility of the overlap cells": the overlaps are filled with the values of the neighboring tiles so that you can compute terms of the model equations near the boundary; without the overlap you would not be able to evaluate any horizontal gradient or average at the domain boundary.
The size of the overlap depends on the computational stencil, that you want to use. A 2nd order operation needs an overlap of 1, a 3rd order operator needs an overlap of 2, and so forth. I think that at the model tells you, when your choice of advection schemes requires more overlap that you have avariciously specified.

Martin

PS:
Here’s my experience with scaling or not scaling (by no means are these absolute numbers or recommendations):
As a rule of thumb, the MITgcm dynamics/thermodynamics kernel (various packages may behave differently) scale usually nearly linearly down to tile sizes of (sNx * sNy) of 30*30, when the overhead of overlap/domain size becomes unfavorable (because of too many local communications between individual tiles) and the global pressure solver takes its toll (because of global communications when all processes have to wait). Below this tile size the time to solution reduces still with more processors, but the more slowly until the overhead is more expensive than the speedup. I re-iterating what Matt already wrote: It’s obvious that for a 30x30 tile, the overlap nearly 2*(Olx*sNy+Oly*sNx), so for an overlap of 2 you already have 8*30 cells in the overlap, more that one quarter of the cells in the interior, etc. From this point of view a tile size of 2x2 + 1 gridpoint overlap is totally inefficient.
Further, it is probably better to have nearly quadratic tiles (so sNx ~ sNy), except for vector machines, where you try to make sNx as large as possible (at least until you reach the maximum vector length of your machine).

In my experience you need to test this for every new computer that you have access to, to find out what is the best range of processors that you can efficiently run with. For example it may be more economic to use fewer processor and wait a little longer for the result, but have enough CPU time left to do a second run of the same type, than to use all you CPU time on a run with twice as many processors that may finish faster, but not twice as fast because the linear scaling limit has been reached.

> On 09 Feb 2015, at 16:05, Jonny Williams <Jonny.Williams at bristol.ac.uk<mailto:Jonny.Williams at bristol.ac.uk>> wrote:
>
> Dear Angela, Matthew
>
> Thanks you very much for your emails.
>
> For your information I have now gotten round my initial problem of the NaNs now by using a shorter timestep although I don't know why this would've have made much difference...
>
> Your discussion about the overlap parameters and run speed is of interest to me because I found that a decrease in timestep by a factor of 4 and an increase in the number of processors by a factor of 10 resulted in an almost identical run speed!
>
> My SIZE.h parameters were as follows...
>
> PARAMETER (
>      &           sNx =  75,
>      &           sNy =  10,
>      &           OLx =   4,
>      &           OLy =   4,
>      &           nSx =   1,
>      &           nSy =   1,
>      &           nPx =   6,
>      &           nPy =   80,
>      &           Nx  = sNx*nSx*nPx,
>      &           Ny  = sNy*nSy*nPy,
>      &           Nr  =   50)
>
> ... so (using the calculation from the earlier email) I have (4+75+4)*(4+10+4)=1494 grid cells per process and (75*10/1494)=50% are cells I care about.
>
> This is really good to know but I got me to thinking, what is the utility of these overlap cells in the first place?
>
> Many thanks!
>
> Jonny


_______________________________________________
MITgcm-support mailing list
MITgcm-support at mitgcm.org<mailto:MITgcm-support at mitgcm.org>
http://mitgcm.org/mailman/listinfo/mitgcm-support



--
Dr Jonny Williams
School of Geographical Sciences
Cabot Institute
University of Bristol
BS8 1SS

+44 (0)117 3318352
jonny.williams at bristol.ac.uk<mailto:jonny.williams at bristol.ac.uk>
http://www.bristol.ac.uk/geography/people/jonny-h-williams<http://bit.ly/jonnywilliams>
_______________________________________________
MITgcm-support mailing list
MITgcm-support at mitgcm.org<mailto:MITgcm-support at mitgcm.org>
http://mitgcm.org/mailman/listinfo/mitgcm-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20150305/6092da95/attachment.htm>


More information about the MITgcm-support mailing list