[MITgcm-support] (no subject)

Thu Jul 14 13:27:44 EDT 2016

Thanks very much Jody,

I'll pass this along and see what they say.

If anyone can answer about optimal tiling and subdomain size per process
that would be very helpful.

Best,

Steve

On Thu, Jul 14, 2016 at 1:23 PM, Jody Klymak <jklymak at uvic.ca> wrote:

> Hi Steve,
>
> I think for nonHydrostatic, you can dial down the number of iterations of
> cg3d.  Of course you won’t get as accurate a pressure field, but I think
> its still more accurate than a hydrostatic pressure field.  The folks
> analyzing the data may also want to think about whether they really need
> non-hydrostatic.  For dz=10 m, unless dx<40 m or so, I personally doubt
> they are getting any benefit for a whole lot of cost.  But that statement
> is somewhat problem dependent.
>
> Has anyone looked at the output?  The model slows down a lot when it
> starts dealing with large aphysical spikes and/or nan’s.  It won’t
> necessarily die on NaN’s.  Looking at your the model text output should
> tell you enough to know if this is happening (often called mitgcm.out, but
> you can call it what you want).
>
> 64 and 128 Gb sounds adequate to me!
>
> You have very low explicit viscosities and diffusivities.  You also appear
> to maybe be using the default advection scheme, which is second order.  I’m
> not sure what Leith does for you, but I would expect this model run to be
> quite “noisy”.  You might also try a higher order advection scheme:
>  tempAdvScheme=77 or 33. This will cause some numerical dissipation and
> diffusion, but maybe your run needs it...
>
> Good luck,  Jody
>
>
>
> On 14 Jul 2016, at  10:06 AM, Stephen Cousins <steve.cousins at maine.edu>
> wrote:
>
> Hi Jody,
>
> Thanks a lot for your message. I believe it is non-hydrostatic that they
> are working with. The nodes have either 64 GB or 128 GB. It didn't look
> like memory was an issue when running the model. It looks like data is
> being saved every three model-hours. I'm not sure about input frequency or
> where to look. I'm checking on this. I see that they are doing something
> with tracers too. Would that significantly add to the compute time?
>
> For your specific questions:
>
> nonHydrostatic=.TRUE.
>
> # Elliptic solver parameters
>  &PARM02
>  cg2dMaxIters=1000,
>  cg2dTargetResidual=1.E-13,
>  cg3dMaxIters=400,
>  cg3dTargetResidual=1.E-13,
>  &
>
>
>
> And the complete "data" file looks like:
>
> # ====================
> # | Model parameters |
> # ====================
> #
> # Continuous equation parameters
>  &PARM01
>  tRef = 15.0971, 14.9392, 14.6122, 14.2581, 13.7115, 13.1676,
>  12.6783, 12.1445, 11.8036, 11.4157, 11.1446, 10.8365, 10.6152,
>  10.5101, 10.3086, 10.1419,  9.9432,  9.7966,  9.6557,  9.5575,
>   9.4637,  9.3636,  9.2620,  9.1635,  9.0730,  8.9940,  8.9250,
>   8.8634,  8.8064,  8.7516,  8.6965,  8.6408,  8.5845,  8.5276,
>   8.4701,
>
> # tRef =  15.0793,  14.5584,  13.4684,  12.1683,  11.2162,
> #    10.56,  10.1263,  9.81831,  9.53284,  9.33325,  9.17307,
> #  9.01999,  8.88124,  8.75708,   8.6409,    8.527,  8.41473,
> #  8.30469,  8.19683,  8.09055,  7.54703,  7.09268,  6.68999,
> #  6.33022,  6.01563,  5.71063,  5.42018,  4.97169,  4.65848,
> #  4.30342,  3.97323,  3.56167,  2.56126,   1.9613,  1.73713,
>  sRef = 33.1496, 33.1604, 33.1909, 33.2162, 33.2352, 33.2463,
>  33.2926, 33.3267, 33.3509, 33.3489, 33.3633, 33.3867, 33.4169,
>  33.4572, 33.5032, 33.5503, 33.5891, 33.6167, 33.6518, 33.6822,
>  33.7143, 33.7496, 33.7860, 33.8214, 33.8538, 33.8815, 33.9049,
>  33.9247, 33.9418, 33.9569, 33.9706, 33.9830, 33.9938, 34.0027,
>  34.0096,
>
> # sRef =  33.3242,  33.3558,  33.4183,  33.4952,  33.5596,
> # 33.6357,  33.6899,   33.742,  33.7821,  33.8284,  33.8728,
> # 33.9165,  33.9603,  33.9824,  34.0046,  34.0211,   34.032,
> # 34.0429,  34.0507,  34.0585,  34.0917,  34.1184,   34.145,
> # 34.1693,  34.1929,  34.2216,  34.2546,  34.3056,  34.3564,
> # 34.4014,  34.4369,  34.4803,  34.5635,  34.6228,  34.6581,
> # tRef = 12.1347,
> # sRef = 39*33.3242,
>  viscAz=1.E-5,
>  viscAh=1.E-5,
>  useFullLeith=.TRUE.,
>  viscC2leith=1,
>  viscA4GridMax = 1,
>  no_slip_sides=.FALSE.,
>  no_slip_bottom=.FALSE.,
>  bottomDragLinear=0.0002,
>  bottomDragQuadratic=0.00003,
>  diffKhT=1.E-5,
>  diffKzT=1.E-5,
>  diffKhS=1.E-5,
>  diffKzS=1.E-5,
>  rotationPeriod=86400.,
>  eosType='LINEAR',
>  gravity=9.81,
>  implicitFreeSurface=.TRUE.,
> # exactConserv=.TRUE.
>  nonHydrostatic=.TRUE.,
> # bottomDragLinear=0.02
> # hFacMin=0.5,
> # hFacMinDr=50.,
>  implicitDiffusion=.TRUE.,
>  implicitViscosity=.TRUE.,
>  hFacInf=0.2,
>  hFacSup=1.8,
>  saltStepping=.TRUE.,
>  tempStepping=.TRUE.,
> #- not safe to use globalFiles in multi-processors runs
> #globalFiles=.TRUE.,
>  readBinaryPrec=64,
>  writeBinaryPrec=64,
>  writeStatePrec=64,
>  &
>
> # Elliptic solver parameters
>  &PARM02
>  cg2dMaxIters=1000,
>  cg2dTargetResidual=1.E-13,
>  cg3dMaxIters=400,
>  cg3dTargetResidual=1.E-13,
>  &
>
> # Time stepping parameters
>  &PARM03
>  nIter0=0,
>  nTimeSteps=49320,
>  deltaT=10.,
>  abEps=0.1,
>  pChkptFreq=432000.,
>  chkptFreq=432000.,
>  dumpFreq=10800.,
>  monitorFreq=86400.,
>  monitorSelect=2,
>  periodicExternalForcing=.true.,
>  externForcingPeriod=21600.,
>  externForcingCycle=2613600.,
> # externForcingPeriod=2592000.,
> # externForcingCycle=31104000.,
>
>  &
>
> # Gridding parameters
>  &PARM04
>  usingSphericalPolarGrid=.TRUE.,
>  delXfile='dx.init',
>  delYfile='dy.init',
>  delRfile='dz.init',
> # delZ=2000.,
> #
> delZ=10.,10.,10.,10.,10.,10.,10.,10.,10.,10.,25.,25.,25.,25.,25.,25.,25.,25.,25.,25.,25.,2
>
> 5.,25.,25.,25.,25.,50.,50.,50.,50.,50.,50.,100.,100.,100.,100.,500.,500.,500
>  ygOrigin=36.,
>  xgOrigin=-123.5,
>  &
>
> # Input datasets
>  &PARM05
>  hydrogThetaFile='temp.init',
>  hydrogSaltFile ='salt.init',
>  bathyFile='topo.init',
> # zonalWindFile='Utau.init',
> # meridWindFile='Vtau.init',
>  &
>
> Here is the last thing I tried for SIZE.h:
>
>      PARAMETER (
>      &           sNx =  100,
>      &           sNy =  65,
>      &           OLx =   3,
>      &           OLy =   3,
>      &           nSx =   1,
>      &           nSy =   1,
>      &           nPx =   6,
>      &           nPy =   8,
>      &           Nx  = sNx*nSx*nPx,
>      &           Ny  = sNy*nSy*nPy,
>      &           Nr  =
> 35
> )
>
> So, I guess I was wrong. This run was using 35 vertical layers rather
> than 21.
>
> Do you see any obvious things that are causing this to be very slow?
>
> Thanks again,
>
> Steve
>
> On Thu, Jul 14, 2016 at 12:22 PM, Jody Klymak <jklymak at uvic.ca> wrote:
>
>> Hi Steve,
>>
>> What is your value of nonHydrostatic in the `data` file, and what are
>> your cg3d convergence criteria?  nonhydrostatic can make the code run
>> 4-10xs slower, and I think has a bunch more inter-tile communications.
>>
>> How much memory does each node have?
>>
>> How often are you writing out data?
>>
>> How often do you have to read in data?
>>
>> Recently, I was running 1024 x 256 x 400 in *hydrostatic* mode on 128
>> processors (haise.navo.hpc.mil), and I would get 60 time steps a
>> minute.  So, for your total domain being 16 times smaller, it seems you are
>> maybe running 10-20 times slower.  So I bet you are running in
>> non-hydrostatic mode, or you are inducing a lot of i/o for some reason.
>>
>> Good luck - hopefully the actual gurus have other suggestions.
>>
>> Cheers,   Jody
>>
>> On 14 Jul 2016, at  8:45 AM, Stephen Cousins <steve.cousins at maine.edu>
>> wrote:
>>
>> Hi,
>>
>> I'm trying to help researchers from the University of Maine to run
>> MITgcm. The model runs they think it should run much faster.
>>
>> I have run or helped run many models while working for the Ocean Modeling
>> Group however this is the first time I have encountered MITgcm.
>>
>> With Rutgers ROMS there is a method of running a number of tiles per
>> sub-domain and it seems that MITgcm can do that too. The reason for doing
>> so with ROMS was (I believe) to try to get the tiles to fit in cache to
>> increase performance. Is that the reason for doing so with MITgcm? We have
>> tried a number of combinations with not much luck.
>>
>> For testing, the full domain we have is 600 x 520 x 21 using 64 processes
>> and getting only 30 time steps per minute. I wondered if the domain was too
>> small for that many processes so I reduced the number of processes but that
>> didn't help. The plan is to triple the resolution in each horizontal
>> direction and double in the vertical.
>>
>> Our cluster has nodes with Intel E5-2600v3 processors totaling 24 cores
>> per node with FDR-10 Infiniband. The way the jobs were specified, some
>> compute nodes had many processes (like 20) on them and some had only 1 or
>> 2. I experimented and found that by using only 4 cores per node and only 48
>> cores, it ran close to twice as fast as with 64 cores and a mix of the
>> numbers of cores per node. To me this indicates that the
>> inter-process-communication is high and it is saturating the memory
>> bandwidth of the nodes with large process counts. That might point to the
>> subdomains being too small halo region being a significant proportion of
>> the subdomain) but in that case when I decreased the run to 16 cores I
>> would have thought that it would have improved things quite a bit. I
>> haven't profiled the code yet. I thought it might be quicker to write to
>> you to get some information first.
>>
>> Can you please explain what the optimal layout is for performance? Is
>> there an optimal size subdomain that you know of for these processors?
>> Optimal number of tiles per subdomain? Also can you explain at a somewhat
>> high level any other factors to consider when running the model to get
>> better performance? Also, are there Intel Haswell CPU-specific compiler
>> flags (we're using the Intel compilers with MVAPICH2) that you can
>> recommend to us? Finally, is there a benchmark case where we can verify
>> that we are getting the expected performance?
>>
>> Thanks very much,
>>
>> Steve
>> --
>> ________________________________________________________________
>>  Steve Cousins             Supercomputer Engineer/Administrator
>>  Advanced Computing Group            University of Maine System
>>  244 Neville Hall (UMS Data Center)              (207) 561-3574
>>  Orono ME 04469                      steve.cousins at maine.edu
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>
>>
>> --
>> Jody Klymak
>> http://web.uvic.ca/~jklymak/
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>
>>
>
>
> --
> ________________________________________________________________
>  Steve Cousins             Supercomputer Engineer/Administrator
>  Advanced Computing Group            University of Maine System
>  244 Neville Hall (UMS Data Center)              (207) 561-3574
>  Orono ME 04469                      steve.cousins at maine.edu
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>
> --
> Jody Klymak
> http://web.uvic.ca/~jklymak/
>
>
>
>
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>

-- 
________________________________________________________________
 Steve Cousins             Supercomputer Engineer/Administrator
 Advanced Computing Group            University of Maine System
 244 Neville Hall (UMS Data Center)              (207) 561-3574
 Orono ME 04469                      steve.cousins at maine.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20160714/94c5c7b1/attachment-0001.htm>