[MITgcm-support] cg convergence vs processor count
Jason Goodman
jgoodman at whoi.edu
Tue Jul 26 14:34:50 EDT 2005
I'm working with a model of nonhydrostatic hydrothermal plume
convection, and have noticed that the conjugate gradient parts of the
code converge differently if the processor count changes. For
example, if I set up a 16-processor run in SIZE.h:
PARAMETER (sNx = 45, sNy = 45, OLx = 3, OLy = 3, nSx =
1,nSy = 1,
& nPx = 4, nPy = 4,
with elliptic solver parameters as follows:
# Elliptic solver parameters
&PARM02
cg2dMaxIters=1000,
cg2dTargetResidual=1.E-9,
cg3dMaxIters=1000,
cg3dTargetResidual=1.E-9,
&
with monitoring on, cg2d and cg3d report this on the 2nd timestep:
cg2d: Sum(rhs),rhsMax = -1.30451205393456E-15 3.71353934155625E-04
(PID.TID 0004.0001) cg2d_init_res =
1.11786045379710E+00
(PID.TID 0004.0001) cg2d_iters =
179 <----
(PID.TID 0004.0001) cg2d_res =
8.38738994562730E-10 <----
(PID.TID 0004.0001) cg3d_init_res =
1.12208678438658E+00
(PID.TID 0004.0001) cg3d_iters = 347
(PID.TID 0004.0001) cg3d_res =
9.87562302924582E-10
But with a 24-processor run, otherwise absolutely identical:
PARAMETER (sNx = 30, sNy = 45, OLx = 3, OLy = 3, nSx =
1,nSy = 1,
& nPx = 6, nPy = 4,
I get this:
cg2d: Sum(rhs),rhsMax = -1.30451205393456E-15 3.71353934155625E-04
(PID.TID 0004.0001) cg2d_init_res =
1.11786045379710E+00
(PID.TID 0004.0001) cg2d_iters =
1000 <----
(PID.TID 0004.0001) cg2d_res =
5.70277451392061E-06 <----
(PID.TID 0004.0001) cg3d_init_res =
1.12211403181414E+00
(PID.TID 0004.0001) cg3d_iters = 427
(PID.TID 0004.0001) cg3d_res =
9.61570391359023E-10
The 24-processor run eventually blows up, apparently due to cg2d
convergence failure.
My understanding was that the numerical solution should be
independent of the underlying processor layout. Is this correct? If
not, what do I need to change to get the 24-processor run working?
If so, what the heck is going on?
More information about the MITgcm-support
mailing list