[MITgcm-support] changing number of processors

Matthew Mazloff mmazloff at ucsd.edu
Fri Feb 6 15:39:46 EST 2015


Hi Angela

The MITgcm scales far better than you are reporting. Given your use of sNx=2, I think you are not considering the extra overhead you are introducing by increasing the overlapping areas. 

And regarding node dependance, that is very dependent on platform and memory/process of your executable. I don't think it has anything to do with the faces of the cube-sphere setup you are running…but perhaps I am wrong on this. What I think happened is when we exceeded 12 processes on the node you exceeded the available local memory, and that has nothing to do with communication. 

Finally, the number of processes/core you request will also be machine dependent. I suspect some cores would actually do better with nSx=2, even given the extra overlap

sorry to derail this thread...
Matt

  
On Feb 6, 2015, at 10:38 AM, Angela Zalucha <azalucha at seti.org> wrote:

> Hi,
> 
> I'm not sure why you would be getting NaN's, but I have found that there is a trick to increasing the number of processors.  I ran on a machine that has 12 processes per node, and the highest number of processors I could run was 1536 (I should point out that at high processor numbers, I found the code to be less efficient, so if you have a limited amount of processor hours, you might be better off running with fewer processors, e.g.: the wall clock time difference between 768 and 1536 processors is only a factor of 1.03).
> 
> Anyway, here is my SIZE.h parameters:
> sNx=2
> sNy=2
> nSx=1
> nSy=1
> nPx=96
> nPy=16
> 
> I have noticed during my scaling tests (and maybe someone can confirm my explanations for this behavior that:
> 1) scaling tests on a 12 processors per node machine had faster wall clock times for a 12 processor/node test than a 16 processor/node test, I think owing to the the cube-sphere geometry having a "built-in" factor of 6, and communication across cube faces gets strange when the number of processors is not a factor of 6)
> (this deeply saddens me because the 12 processor machine I used to use was retired Jan. 1, and now I have to run on a 16 processor machine, even this is the wave of the future, it hurts my efficiency)
> 2) sNx*nSx*nPx = 192 and sNy*nSy*nPy=32
> 3) For the same number of processors, faster wall clock times are achieved when nSx and nSy are minimized.
> 
> I can produce tables and tables of configurations if you want, since at low processors there is degeneracy  between sNx,nSx,nPx  and sNy,nSy,nPy, respectively.
> 
>   Angela
> 
> 
> On 02/06/2015 08:45 AM, Jonny Williams wrote:
>> Hi eveyrone
>> 
>> I'm trying to run my regional model on 480 processors, up from a
>> successfully working 48 procesor version.
>> 
>> I have recompiled my code.
>> 
>> To do this (in SIZE.h) I reduced sNy by a factor of 10 and increased nPy
>> by a factor of ten so that nPx*nPy was increased by a factor of 10,
>> which I think is the total number of processors.
>> 
>> The executable was created fine and the model does run but the data I am
>> getting out in my NetCDF files (mnc package) is all NaNs.
>> 
>> Has anyone encountered this type of issue or know how to fix it?
>> 
>> Is there a maximum number of processors?
>> 
>> Many thanks
>> 
>> Jonny
>> 
>> --
>> Dr Jonny Williams
>> School of Geographical Sciences
>> Cabot Institute
>> University of Bristol
>> BS8 1SS
>> 
>> +44 (0)117 3318352
>> jonny.williams at bristol.ac.uk <mailto:jonny.williams at bristol.ac.uk>
>> http://www.bristol.ac.uk/geography/people/jonny-h-williams
>> <http://bit.ly/jonnywilliams>
>> 
>> 
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>> 
> 
> -- 
> =====================
> Angela Zalucha, PhD
> Research Scientist
> SETI Institute
> +1 (617) 894-2937
> =====================
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support




More information about the MITgcm-support mailing list