[MITgcm-support] (no subject)

Thu Jul 14 13:06:30 EDT 2016

Hi Jody,

Thanks a lot for your message. I believe it is non-hydrostatic that they
are working with. The nodes have either 64 GB or 128 GB. It didn't look
like memory was an issue when running the model. It looks like data is
being saved every three model-hours. I'm not sure about input frequency or
where to look. I'm checking on this. I see that they are doing something
with tracers too. Would that significantly add to the compute time?

For your specific questions:

nonHydrostatic=.TRUE.

# Elliptic solver parameters
 &PARM02
 cg2dMaxIters=1000,
 cg2dTargetResidual=1.E-13,
 cg3dMaxIters=400,
 cg3dTargetResidual=1.E-13,
 &

And the complete "data" file looks like:

# ====================
# | Model parameters |
# ====================
#
# Continuous equation parameters
 &PARM01
 tRef = 15.0971, 14.9392, 14.6122, 14.2581, 13.7115, 13.1676,
 12.6783, 12.1445, 11.8036, 11.4157, 11.1446, 10.8365, 10.6152,
 10.5101, 10.3086, 10.1419,  9.9432,  9.7966,  9.6557,  9.5575,
  9.4637,  9.3636,  9.2620,  9.1635,  9.0730,  8.9940,  8.9250,
  8.8634,  8.8064,  8.7516,  8.6965,  8.6408,  8.5845,  8.5276,
  8.4701,

# tRef =  15.0793,  14.5584,  13.4684,  12.1683,  11.2162,
#    10.56,  10.1263,  9.81831,  9.53284,  9.33325,  9.17307,
#  9.01999,  8.88124,  8.75708,   8.6409,    8.527,  8.41473,
#  8.30469,  8.19683,  8.09055,  7.54703,  7.09268,  6.68999,
#  6.33022,  6.01563,  5.71063,  5.42018,  4.97169,  4.65848,
#  4.30342,  3.97323,  3.56167,  2.56126,   1.9613,  1.73713,
 sRef = 33.1496, 33.1604, 33.1909, 33.2162, 33.2352, 33.2463,
 33.2926, 33.3267, 33.3509, 33.3489, 33.3633, 33.3867, 33.4169,
 33.4572, 33.5032, 33.5503, 33.5891, 33.6167, 33.6518, 33.6822,
 33.7143, 33.7496, 33.7860, 33.8214, 33.8538, 33.8815, 33.9049,
 33.9247, 33.9418, 33.9569, 33.9706, 33.9830, 33.9938, 34.0027,
 34.0096,

# sRef =  33.3242,  33.3558,  33.4183,  33.4952,  33.5596,
# 33.6357,  33.6899,   33.742,  33.7821,  33.8284,  33.8728,
# 33.9165,  33.9603,  33.9824,  34.0046,  34.0211,   34.032,
# 34.0429,  34.0507,  34.0585,  34.0917,  34.1184,   34.145,
# 34.1693,  34.1929,  34.2216,  34.2546,  34.3056,  34.3564,
# 34.4014,  34.4369,  34.4803,  34.5635,  34.6228,  34.6581,
# tRef = 12.1347,
# sRef = 39*33.3242,
 viscAz=1.E-5,
 viscAh=1.E-5,
 useFullLeith=.TRUE.,
 viscC2leith=1,
 viscA4GridMax = 1,
 no_slip_sides=.FALSE.,
 no_slip_bottom=.FALSE.,
 bottomDragLinear=0.0002,
 bottomDragQuadratic=0.00003,
 diffKhT=1.E-5,
 diffKzT=1.E-5,
 diffKhS=1.E-5,
 diffKzS=1.E-5,
 rotationPeriod=86400.,
 eosType='LINEAR',
 gravity=9.81,
 implicitFreeSurface=.TRUE.,
# exactConserv=.TRUE.
 nonHydrostatic=.TRUE.,
# bottomDragLinear=0.02
# hFacMin=0.5,
# hFacMinDr=50.,
 implicitDiffusion=.TRUE.,
 implicitViscosity=.TRUE.,
 hFacInf=0.2,
 hFacSup=1.8,
 saltStepping=.TRUE.,
 tempStepping=.TRUE.,
#- not safe to use globalFiles in multi-processors runs
#globalFiles=.TRUE.,
 readBinaryPrec=64,
 writeBinaryPrec=64,
 writeStatePrec=64,
 &

# Elliptic solver parameters
 &PARM02
 cg2dMaxIters=1000,
 cg2dTargetResidual=1.E-13,
 cg3dMaxIters=400,
 cg3dTargetResidual=1.E-13,
 &

# Time stepping parameters
 &PARM03
 nIter0=0,
 nTimeSteps=49320,
 deltaT=10.,
 abEps=0.1,
 pChkptFreq=432000.,
 chkptFreq=432000.,
 dumpFreq=10800.,
 monitorFreq=86400.,
 monitorSelect=2,
 periodicExternalForcing=.true.,
 externForcingPeriod=21600.,
 externForcingCycle=2613600.,
# externForcingPeriod=2592000.,
# externForcingCycle=31104000.,

 &

# Gridding parameters
 &PARM04
 usingSphericalPolarGrid=.TRUE.,
 delXfile='dx.init',
 delYfile='dy.init',
 delRfile='dz.init',
# delZ=2000.,
#
delZ=10.,10.,10.,10.,10.,10.,10.,10.,10.,10.,25.,25.,25.,25.,25.,25.,25.,25.,25.,25.,25.,2
5.,25.,25.,25.,25.,50.,50.,50.,50.,50.,50.,100.,100.,100.,100.,500.,500.,500
 ygOrigin=36.,
 xgOrigin=-123.5,
 &

# Input datasets
 &PARM05
 hydrogThetaFile='temp.init',
 hydrogSaltFile ='salt.init',
 bathyFile='topo.init',
# zonalWindFile='Utau.init',
# meridWindFile='Vtau.init',
 &

Here is the last thing I tried for SIZE.h:

     PARAMETER (
     &           sNx =  100,
     &           sNy =  65,
     &           OLx =   3,
     &           OLy =   3,
     &           nSx =   1,
     &           nSy =   1,
     &           nPx =   6,
     &           nPy =   8,
     &           Nx  = sNx*nSx*nPx,
     &           Ny  = sNy*nSy*nPy,
     &           Nr  =
35
)

So, I guess I was wrong. This run was using 35 vertical layers rather than
21.

Do you see any obvious things that are causing this to be very slow?

Thanks again,

Steve

On Thu, Jul 14, 2016 at 12:22 PM, Jody Klymak <jklymak at uvic.ca> wrote:

> Hi Steve,
>
> What is your value of nonHydrostatic in the `data` file, and what are your
> cg3d convergence criteria?  nonhydrostatic can make the code run 4-10xs
> slower, and I think has a bunch more inter-tile communications.
>
> How much memory does each node have?
>
> How often are you writing out data?
>
> How often do you have to read in data?
>
> Recently, I was running 1024 x 256 x 400 in *hydrostatic* mode on 128
> processors (haise.navo.hpc.mil), and I would get 60 time steps a minute.
> So, for your total domain being 16 times smaller, it seems you are maybe
> running 10-20 times slower.  So I bet you are running in non-hydrostatic
> mode, or you are inducing a lot of i/o for some reason.
>
> Good luck - hopefully the actual gurus have other suggestions.
>
> Cheers,   Jody
>
> On 14 Jul 2016, at  8:45 AM, Stephen Cousins <steve.cousins at maine.edu>
> wrote:
>
> Hi,
>
> I'm trying to help researchers from the University of Maine to run MITgcm.
> The model runs they think it should run much faster.
>
> I have run or helped run many models while working for the Ocean Modeling
> Group however this is the first time I have encountered MITgcm.
>
> With Rutgers ROMS there is a method of running a number of tiles per
> sub-domain and it seems that MITgcm can do that too. The reason for doing
> so with ROMS was (I believe) to try to get the tiles to fit in cache to
> increase performance. Is that the reason for doing so with MITgcm? We have
> tried a number of combinations with not much luck.
>
> For testing, the full domain we have is 600 x 520 x 21 using 64 processes
> and getting only 30 time steps per minute. I wondered if the domain was too
> small for that many processes so I reduced the number of processes but that
> didn't help. The plan is to triple the resolution in each horizontal
> direction and double in the vertical.
>
> Our cluster has nodes with Intel E5-2600v3 processors totaling 24 cores
> per node with FDR-10 Infiniband. The way the jobs were specified, some
> compute nodes had many processes (like 20) on them and some had only 1 or
> 2. I experimented and found that by using only 4 cores per node and only 48
> cores, it ran close to twice as fast as with 64 cores and a mix of the
> numbers of cores per node. To me this indicates that the
> inter-process-communication is high and it is saturating the memory
> bandwidth of the nodes with large process counts. That might point to the
> subdomains being too small halo region being a significant proportion of
> the subdomain) but in that case when I decreased the run to 16 cores I
> would have thought that it would have improved things quite a bit. I
> haven't profiled the code yet. I thought it might be quicker to write to
> you to get some information first.
>
> Can you please explain what the optimal layout is for performance? Is
> there an optimal size subdomain that you know of for these processors?
> Optimal number of tiles per subdomain? Also can you explain at a somewhat
> high level any other factors to consider when running the model to get
> better performance? Also, are there Intel Haswell CPU-specific compiler
> flags (we're using the Intel compilers with MVAPICH2) that you can
> recommend to us? Finally, is there a benchmark case where we can verify
> that we are getting the expected performance?
>
> Thanks very much,
>
> Steve
> --
> ________________________________________________________________
>  Steve Cousins             Supercomputer Engineer/Administrator
>  Advanced Computing Group            University of Maine System
>  244 Neville Hall (UMS Data Center)              (207) 561-3574
>  Orono ME 04469                      steve.cousins at maine.edu
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>
> --
> Jody Klymak
> http://web.uvic.ca/~jklymak/
>
>
>
>
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>

-- 
________________________________________________________________
 Steve Cousins             Supercomputer Engineer/Administrator
 Advanced Computing Group            University of Maine System
 244 Neville Hall (UMS Data Center)              (207) 561-3574
 Orono ME 04469                      steve.cousins at maine.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20160714/eebc0fbb/attachment-0001.htm>