[MITgcm-support] changing number of processors

Mon Feb 9 10:05:25 EST 2015

Dear Angela, Matthew

Thanks you very much for your emails.

For your information I have now gotten round my initial problem of the NaNs
now by using a shorter timestep although I don't know why this would've
have made much difference...

Your discussion about the overlap parameters and run speed is of interest
to me because I found that a decrease in timestep by a factor of 4 and an
increase in the number of processors by a factor of 10 resulted in an
almost identical run speed!

My SIZE.h parameters were as follows...

PARAMETER (
     &           sNx =  75,
     &           sNy =  10,
     &           OLx =   4,
     &           OLy =   4,
     &           nSx =   1,
     &           nSy =   1,
     &           nPx =   6,
     &           nPy =   80,
     &           Nx  = sNx*nSx*nPx,
     &           Ny  = sNy*nSy*nPy,
     &           Nr  =   50)

... so (using the calculation from the earlier email) I have
(4+75+4)*(4+10+4)=1494 grid cells per process and (75*10/1494)=50% are
cells I care about.

This is really good to know but I got me to thinking, what is the utility
of these overlap cells in the first place?

Many thanks!

Jonny

On 7 February 2015 at 03:35, Matthew Mazloff <mmazloff at ucsd.edu> wrote:

> Hi Angela
>
> Regarding overlap
>
> Lets say your overlap, oLx, is 3
>
> Then for
> >>> sNx=60
> >>> sNy=60
> you have (3+60+3)* (3+60+3) = 4356 grid cells per process and 83% are
> cells you care about.
>
> >>> sNx=30
> >>> sNy=30
> you have (3+30+3)* (3+30+3) = 1296 grid cells per processor and 69% are
> cells you care about.
>
> >>> sNx=2
> >>> sNy=2
> you have (3+2+3)* (3+2+3) = 64 grid cells per processor and 6% are cells
> you care about.
>
> So your scaling is so extreme by getting down to sNx,sNy=2 that a huge
> percentage of your calculation is just in the overlap region.
>
> If instead you doubled your resolution so sNx stayed the same but nPx
> doubled, and thus your overlap % stayed the same, then that would be a true
> scaling test. And for that, the MITgcm does very well.
>
> Regarding memory per node -- you can look at the hardware specs and see
> that, e.g., stampede nodes have about 32GB. Then you can estimate how much
> memory each of your processes needs. E.g. this can be estimated (somewhat
> inaccurately) by command:
> size mitgcmuv.
> Look at the last number -- if you see it is at 2GB/core, then if you use
> all 16 cores you will leave no memory for the operating system and the node
> will have to swap (or crash). However by just reducing to use 15 of the 16
> cores per node will fix this and the performance will drastically improve.
>
> Let me know if this doesn't make sense
>
> Matt
>
>
>
>
>
> On Feb 6, 2015, at 2:43 PM, Angela Zalucha <azalucha at seti.org> wrote:
>
> > Matt, I would be interested to know more about what you are saying,
> because obviously I want to maximize the efficiency of the code, since I
> need to do some very very long simulations (multiple Pluto years where 1
> Pluto year = 248 Earth years).  My conclusion about processors came from
> testing on TACC Lonestar (12 cores/node, now defunct), TACC Stampede (16
> cores/node), two local machines (Notus & Boreas with 24 and Ghost with 64
> processors each), a funny computer cluster out of the University of Houston
> (Titan) that has both 12/cores per node and 8/cores per node and is really
> only useful up to 12 nodes due to poor connections between the nodes but
> the individual nodes are very fast, and NASA HEC Pleiades (which offhand I
> think is 12/cores per node).
> >
> > You're right, the scaling is quite bad under my scheme, so if you or
> anyone could help, it would be quite valuable to me.
> >
> > I've attached a plot of my findings.  I've included only the fastest
> times, because as I said before there are multiple ways to do say 24
> processors. (Sorry there are two files, I pulled them from different
> sources since I have only recently had access to Stampede and my access to
> other machines has gotten yanked).
> >
> >       Angela
> >
> > On 02/06/2015 01:39 PM, Matthew Mazloff wrote:
> >> Hi Angela
> >>
> >> The MITgcm scales far better than you are reporting. Given your use of
> sNx=2, I think you are not considering the extra overhead you are
> introducing by increasing the overlapping areas.
> >>
> >> And regarding node dependance, that is very dependent on platform and
> memory/process of your executable. I don't think it has anything to do with
> the faces of the cube-sphere setup you are running…but perhaps I am wrong
> on this. What I think happened is when we exceeded 12 processes on the node
> you exceeded the available local memory, and that has nothing to do with
> communication.
> >>
> >> Finally, the number of processes/core you request will also be machine
> dependent. I suspect some cores would actually do better with nSx=2, even
> given the extra overlap
> >>
> >> sorry to derail this thread...
> >> Matt
> >>
> >>
> >> On Feb 6, 2015, at 10:38 AM, Angela Zalucha <azalucha at seti.org> wrote:
> >>
> >>> Hi,
> >>>
> >>> I'm not sure why you would be getting NaN's, but I have found that
> there is a trick to increasing the number of processors.  I ran on a
> machine that has 12 processes per node, and the highest number of
> processors I could run was 1536 (I should point out that at high processor
> numbers, I found the code to be less efficient, so if you have a limited
> amount of processor hours, you might be better off running with fewer
> processors, e.g.: the wall clock time difference between 768 and 1536
> processors is only a factor of 1.03).
> >>>
> >>> Anyway, here is my SIZE.h parameters:
> >>> sNx=2
> >>> sNy=2
> >>> nSx=1
> >>> nSy=1
> >>> nPx=96
> >>> nPy=16
> >>>
> >>> I have noticed during my scaling tests (and maybe someone can confirm
> my explanations for this behavior that:
> >>> 1) scaling tests on a 12 processors per node machine had faster wall
> clock times for a 12 processor/node test than a 16 processor/node test, I
> think owing to the the cube-sphere geometry having a "built-in" factor of
> 6, and communication across cube faces gets strange when the number of
> processors is not a factor of 6)
> >>> (this deeply saddens me because the 12 processor machine I used to use
> was retired Jan. 1, and now I have to run on a 16 processor machine, even
> this is the wave of the future, it hurts my efficiency)
> >>> 2) sNx*nSx*nPx = 192 and sNy*nSy*nPy=32
> >>> 3) For the same number of processors, faster wall clock times are
> achieved when nSx and nSy are minimized.
> >>>
> >>> I can produce tables and tables of configurations if you want, since
> at low processors there is degeneracy  between sNx,nSx,nPx  and
> sNy,nSy,nPy, respectively.
> >>>
> >>>   Angela
> >>>
> >>>
> >>> On 02/06/2015 08:45 AM, Jonny Williams wrote:
> >>>> Hi eveyrone
> >>>>
> >>>> I'm trying to run my regional model on 480 processors, up from a
> >>>> successfully working 48 procesor version.
> >>>>
> >>>> I have recompiled my code.
> >>>>
> >>>> To do this (in SIZE.h) I reduced sNy by a factor of 10 and increased
> nPy
> >>>> by a factor of ten so that nPx*nPy was increased by a factor of 10,
> >>>> which I think is the total number of processors.
> >>>>
> >>>> The executable was created fine and the model does run but the data I
> am
> >>>> getting out in my NetCDF files (mnc package) is all NaNs.
> >>>>
> >>>> Has anyone encountered this type of issue or know how to fix it?
> >>>>
> >>>> Is there a maximum number of processors?
> >>>>
> >>>> Many thanks
> >>>>
> >>>> Jonny
> >>>>
> >>>> --
> >>>> Dr Jonny Williams
> >>>> School of Geographical Sciences
> >>>> Cabot Institute
> >>>> University of Bristol
> >>>> BS8 1SS
> >>>>
> >>>> +44 (0)117 3318352
> >>>> jonny.williams at bristol.ac.uk <mailto:jonny.williams at bristol.ac.uk>
> >>>> http://www.bristol.ac.uk/geography/people/jonny-h-williams
> >>>> <http://bit.ly/jonnywilliams>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> MITgcm-support mailing list
> >>>> MITgcm-support at mitgcm.org
> >>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
> >>>>
> >>>
> >>> --
> >>> =====================
> >>> Angela Zalucha, PhD
> >>> Research Scientist
> >>> SETI Institute
> >>> +1 (617) 894-2937
> >>> =====================
> >>>
> >>> _______________________________________________
> >>> MITgcm-support mailing list
> >>> MITgcm-support at mitgcm.org
> >>> http://mitgcm.org/mailman/listinfo/mitgcm-support
> >>
> >>
> >> _______________________________________________
> >> MITgcm-support mailing list
> >> MITgcm-support at mitgcm.org
> >> http://mitgcm.org/mailman/listinfo/mitgcm-support
> >>
> >
> > --
> > =====================
> > Angela Zalucha, PhD
> > Research Scientist
> > SETI Institute
> > +1 (617) 894-2937
> > =====================
> >
> <scaling_all_2-eps-converted-to.pdf><scaling_all.eps>_______________________________________________
> > MITgcm-support mailing list
> > MITgcm-support at mitgcm.org
> > http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>

-- 
Dr Jonny Williams
School of Geographical Sciences
Cabot Institute
University of Bristol
BS8 1SS

+44 (0)117 3318352
jonny.williams at bristol.ac.uk
http://www.bristol.ac.uk/geography/people/jonny-h-williams
<http://bit.ly/jonnywilliams>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20150209/94e00f8e/attachment.htm>