[MITgcm-support] changing number of processors

Fri Feb 6 22:35:11 EST 2015

Hi Angela

Regarding overlap

Lets say your overlap, oLx, is 3

Then for 
>>> sNx=60
>>> sNy=60
you have (3+60+3)* (3+60+3) = 4356 grid cells per process and 83% are cells you care about.

>>> sNx=30
>>> sNy=30
you have (3+30+3)* (3+30+3) = 1296 grid cells per processor and 69% are cells you care about.

>>> sNx=2
>>> sNy=2
you have (3+2+3)* (3+2+3) = 64 grid cells per processor and 6% are cells you care about.

So your scaling is so extreme by getting down to sNx,sNy=2 that a huge percentage of your calculation is just in the overlap region.

If instead you doubled your resolution so sNx stayed the same but nPx doubled, and thus your overlap % stayed the same, then that would be a true scaling test. And for that, the MITgcm does very well.

Regarding memory per node -- you can look at the hardware specs and see that, e.g., stampede nodes have about 32GB. Then you can estimate how much memory each of your processes needs. E.g. this can be estimated (somewhat inaccurately) by command:
size mitgcmuv.
Look at the last number -- if you see it is at 2GB/core, then if you use all 16 cores you will leave no memory for the operating system and the node will have to swap (or crash). However by just reducing to use 15 of the 16 cores per node will fix this and the performance will drastically improve.

Let me know if this doesn't make sense 

Matt

On Feb 6, 2015, at 2:43 PM, Angela Zalucha <azalucha at seti.org> wrote:

> Matt, I would be interested to know more about what you are saying, because obviously I want to maximize the efficiency of the code, since I need to do some very very long simulations (multiple Pluto years where 1 Pluto year = 248 Earth years).  My conclusion about processors came from testing on TACC Lonestar (12 cores/node, now defunct), TACC Stampede (16 cores/node), two local machines (Notus & Boreas with 24 and Ghost with 64 processors each), a funny computer cluster out of the University of Houston (Titan) that has both 12/cores per node and 8/cores per node and is really only useful up to 12 nodes due to poor connections between the nodes but the individual nodes are very fast, and NASA HEC Pleiades (which offhand I think is 12/cores per node).
> 
> You're right, the scaling is quite bad under my scheme, so if you or anyone could help, it would be quite valuable to me.
> 
> I've attached a plot of my findings.  I've included only the fastest times, because as I said before there are multiple ways to do say 24 processors. (Sorry there are two files, I pulled them from different sources since I have only recently had access to Stampede and my access to other machines has gotten yanked).
> 
> 	Angela
> 
> On 02/06/2015 01:39 PM, Matthew Mazloff wrote:
>> Hi Angela
>> 
>> The MITgcm scales far better than you are reporting. Given your use of sNx=2, I think you are not considering the extra overhead you are introducing by increasing the overlapping areas.
>> 
>> And regarding node dependance, that is very dependent on platform and memory/process of your executable. I don't think it has anything to do with the faces of the cube-sphere setup you are running…but perhaps I am wrong on this. What I think happened is when we exceeded 12 processes on the node you exceeded the available local memory, and that has nothing to do with communication.
>> 
>> Finally, the number of processes/core you request will also be machine dependent. I suspect some cores would actually do better with nSx=2, even given the extra overlap
>> 
>> sorry to derail this thread...
>> Matt
>> 
>> 
>> On Feb 6, 2015, at 10:38 AM, Angela Zalucha <azalucha at seti.org> wrote:
>> 
>>> Hi,
>>> 
>>> I'm not sure why you would be getting NaN's, but I have found that there is a trick to increasing the number of processors.  I ran on a machine that has 12 processes per node, and the highest number of processors I could run was 1536 (I should point out that at high processor numbers, I found the code to be less efficient, so if you have a limited amount of processor hours, you might be better off running with fewer processors, e.g.: the wall clock time difference between 768 and 1536 processors is only a factor of 1.03).
>>> 
>>> Anyway, here is my SIZE.h parameters:
>>> sNx=2
>>> sNy=2
>>> nSx=1
>>> nSy=1
>>> nPx=96
>>> nPy=16
>>> 
>>> I have noticed during my scaling tests (and maybe someone can confirm my explanations for this behavior that:
>>> 1) scaling tests on a 12 processors per node machine had faster wall clock times for a 12 processor/node test than a 16 processor/node test, I think owing to the the cube-sphere geometry having a "built-in" factor of 6, and communication across cube faces gets strange when the number of processors is not a factor of 6)
>>> (this deeply saddens me because the 12 processor machine I used to use was retired Jan. 1, and now I have to run on a 16 processor machine, even this is the wave of the future, it hurts my efficiency)
>>> 2) sNx*nSx*nPx = 192 and sNy*nSy*nPy=32
>>> 3) For the same number of processors, faster wall clock times are achieved when nSx and nSy are minimized.
>>> 
>>> I can produce tables and tables of configurations if you want, since at low processors there is degeneracy  between sNx,nSx,nPx  and sNy,nSy,nPy, respectively.
>>> 
>>>   Angela
>>> 
>>> 
>>> On 02/06/2015 08:45 AM, Jonny Williams wrote:
>>>> Hi eveyrone
>>>> 
>>>> I'm trying to run my regional model on 480 processors, up from a
>>>> successfully working 48 procesor version.
>>>> 
>>>> I have recompiled my code.
>>>> 
>>>> To do this (in SIZE.h) I reduced sNy by a factor of 10 and increased nPy
>>>> by a factor of ten so that nPx*nPy was increased by a factor of 10,
>>>> which I think is the total number of processors.
>>>> 
>>>> The executable was created fine and the model does run but the data I am
>>>> getting out in my NetCDF files (mnc package) is all NaNs.
>>>> 
>>>> Has anyone encountered this type of issue or know how to fix it?
>>>> 
>>>> Is there a maximum number of processors?
>>>> 
>>>> Many thanks
>>>> 
>>>> Jonny
>>>> 
>>>> --
>>>> Dr Jonny Williams
>>>> School of Geographical Sciences
>>>> Cabot Institute
>>>> University of Bristol
>>>> BS8 1SS
>>>> 
>>>> +44 (0)117 3318352
>>>> jonny.williams at bristol.ac.uk <mailto:jonny.williams at bristol.ac.uk>
>>>> http://www.bristol.ac.uk/geography/people/jonny-h-williams
>>>> <http://bit.ly/jonnywilliams>
>>>> 
>>>> 
>>>> _______________________________________________
>>>> MITgcm-support mailing list
>>>> MITgcm-support at mitgcm.org
>>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>>> 
>>> 
>>> --
>>> =====================
>>> Angela Zalucha, PhD
>>> Research Scientist
>>> SETI Institute
>>> +1 (617) 894-2937
>>> =====================
>>> 
>>> _______________________________________________
>>> MITgcm-support mailing list
>>> MITgcm-support at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>> 
>> 
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>> 
> 
> -- 
> =====================
> Angela Zalucha, PhD
> Research Scientist
> SETI Institute
> +1 (617) 894-2937
> =====================
> <scaling_all_2-eps-converted-to.pdf><scaling_all.eps>_______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support