[MITgcm-support] changing number of processors
Jonny Williams
Jonny.Williams at bristol.ac.uk
Mon Mar 16 09:58:20 EDT 2015
Thanks for this Dimitris
The timings at the end of the STDOUT.0000 file are extremely useful and
have enabled me to diagnose the fact that the I/O definitely is the
limiting factor in my runs at the moment.
For example, in going from 48 (6x8) to 480 (15x32) processors, the time
spent in the section called "DO_THE_MODEL_IO [FORWARD_STEP]" in STDOUT.0000
increased from 3% to 67% of the amount of total run time used!
I may well have to look into using the mdsio package again unless there is
a way round this I/O, NetCDF issue?
Many thanks again
Jonny
On 5 March 2015 at 11:57, Menemenlis, Dimitris (329D) <
Dimitris.Menemenlis at jpl.nasa.gov> wrote:
> My personal prejudice (and it may be wrong): if you want efficient I/O,
> you need to get rid of netcdf package; mdsio and extensions is a lot more
> flexible and efficient. In any case there is no need to guess about cause
> of bottleneck; just look at timings at end of your STDOUT.0000 file.
>
> On Mar 5, 2015, at 3:20 AM, Jonny Williams <Jonny.Williams at bristol.ac.uk>
> wrote:
>
> As a related question to this thread, is it possible to output one
> NetCDF file per stream (state*.nc, ptracers*.nc, etc) rather than one per
> process?
>
> I am currently running on ARCHER, the national supercomputing facility
> and I am not getting the speed up that I am expecting for a long job
> whereas I did get the expected speed for a very short test job.
>
> I am thinking that the I/O may be a bottleneck here perhaps?
>
> Cheers!
>
> Jonny
>
> On 10 February 2015 at 07:38, Martin Losch <Martin.Losch at awi.de> wrote:
>
>> Hi Jonny and others,
>>
>> I am not sure if I understand your question about "the utility of the
>> overlap cells": the overlaps are filled with the values of the neighboring
>> tiles so that you can compute terms of the model equations near the
>> boundary; without the overlap you would not be able to evaluate any
>> horizontal gradient or average at the domain boundary.
>> The size of the overlap depends on the computational stencil, that you
>> want to use. A 2nd order operation needs an overlap of 1, a 3rd order
>> operator needs an overlap of 2, and so forth. I think that at the model
>> tells you, when your choice of advection schemes requires more overlap that
>> you have avariciously specified.
>>
>> Martin
>>
>> PS:
>> Here’s my experience with scaling or not scaling (by no means are these
>> absolute numbers or recommendations):
>> As a rule of thumb, the MITgcm dynamics/thermodynamics kernel (various
>> packages may behave differently) scale usually nearly linearly down to tile
>> sizes of (sNx * sNy) of 30*30, when the overhead of overlap/domain size
>> becomes unfavorable (because of too many local communications between
>> individual tiles) and the global pressure solver takes its toll (because of
>> global communications when all processes have to wait). Below this tile
>> size the time to solution reduces still with more processors, but the more
>> slowly until the overhead is more expensive than the speedup. I
>> re-iterating what Matt already wrote: It’s obvious that for a 30x30 tile,
>> the overlap nearly 2*(Olx*sNy+Oly*sNx), so for an overlap of 2 you already
>> have 8*30 cells in the overlap, more that one quarter of the cells in the
>> interior, etc. From this point of view a tile size of 2x2 + 1 gridpoint
>> overlap is totally inefficient.
>> Further, it is probably better to have nearly quadratic tiles (so sNx ~
>> sNy), except for vector machines, where you try to make sNx as large as
>> possible (at least until you reach the maximum vector length of your
>> machine).
>>
>> In my experience you need to test this for every new computer that you
>> have access to, to find out what is the best range of processors that you
>> can efficiently run with. For example it may be more economic to use fewer
>> processor and wait a little longer for the result, but have enough CPU time
>> left to do a second run of the same type, than to use all you CPU time on a
>> run with twice as many processors that may finish faster, but not twice as
>> fast because the linear scaling limit has been reached.
>>
>> > On 09 Feb 2015, at 16:05, Jonny Williams <Jonny.Williams at bristol.ac.uk>
>> wrote:
>> >
>> > Dear Angela, Matthew
>> >
>> > Thanks you very much for your emails.
>> >
>> > For your information I have now gotten round my initial problem of the
>> NaNs now by using a shorter timestep although I don't know why this
>> would've have made much difference...
>> >
>> > Your discussion about the overlap parameters and run speed is of
>> interest to me because I found that a decrease in timestep by a factor of 4
>> and an increase in the number of processors by a factor of 10 resulted in
>> an almost identical run speed!
>> >
>> > My SIZE.h parameters were as follows...
>> >
>> > PARAMETER (
>> > & sNx = 75,
>> > & sNy = 10,
>> > & OLx = 4,
>> > & OLy = 4,
>> > & nSx = 1,
>> > & nSy = 1,
>> > & nPx = 6,
>> > & nPy = 80,
>> > & Nx = sNx*nSx*nPx,
>> > & Ny = sNy*nSy*nPy,
>> > & Nr = 50)
>> >
>> > ... so (using the calculation from the earlier email) I have
>> (4+75+4)*(4+10+4)=1494 grid cells per process and (75*10/1494)=50% are
>> cells I care about.
>> >
>> > This is really good to know but I got me to thinking, what is the
>> utility of these overlap cells in the first place?
>> >
>> > Many thanks!
>> >
>> > Jonny
>>
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>
>
>
>
> --
> Dr Jonny Williams
> School of Geographical Sciences
> Cabot Institute
> University of Bristol
> BS8 1SS
>
> +44 (0)117 3318352
> jonny.williams at bristol.ac.uk
> http://www.bristol.ac.uk/geography/people/jonny-h-williams
> <http://bit.ly/jonnywilliams>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>
--
Dr Jonny Williams
School of Geographical Sciences
Cabot Institute
University of Bristol
BS8 1SS
+44 (0)117 3318352
jonny.williams at bristol.ac.uk
http://www.bristol.ac.uk/geography/people/jonny-h-williams
<http://bit.ly/jonnywilliams>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20150316/90954541/attachment-0001.htm>
More information about the MITgcm-support
mailing list