[MITgcm-support] cg convergence vs processor count

Tue Jul 26 14:34:50 EDT 2005

I'm working with a model of nonhydrostatic hydrothermal plume  
convection, and have noticed that the conjugate gradient parts of the  
code converge differently if the processor count changes.  For  
example, if I set up a 16-processor run in SIZE.h:

       PARAMETER (sNx =  45, sNy =  45, OLx =   3, OLy =   3, nSx =    
1,nSy =   1,
      &           nPx =   4, nPy =   4,

with elliptic solver parameters as follows:
# Elliptic solver parameters
&PARM02
cg2dMaxIters=1000,
cg2dTargetResidual=1.E-9,
cg3dMaxIters=1000,
cg3dTargetResidual=1.E-9,
&

with monitoring on, cg2d and cg3d report this on the 2nd timestep:

cg2d: Sum(rhs),rhsMax =  -1.30451205393456E-15  3.71353934155625E-04
(PID.TID 0004.0001)                    cg2d_init_res =     
1.11786045379710E+00
(PID.TID 0004.0001)                       cg2d_iters =    
179                    <----
(PID.TID 0004.0001)                         cg2d_res =     
8.38738994562730E-10  <----
(PID.TID 0004.0001)                    cg3d_init_res =     
1.12208678438658E+00
(PID.TID 0004.0001)                       cg3d_iters =   347
(PID.TID 0004.0001)                         cg3d_res =     
9.87562302924582E-10

But with a 24-processor run, otherwise absolutely identical:

       PARAMETER (sNx =  30, sNy =  45, OLx =   3, OLy =   3, nSx =    
1,nSy =   1,
      &           nPx =   6, nPy =   4,

I get this:

cg2d: Sum(rhs),rhsMax =  -1.30451205393456E-15  3.71353934155625E-04
(PID.TID 0004.0001)                    cg2d_init_res =     
1.11786045379710E+00
(PID.TID 0004.0001)                       cg2d_iters =   
1000                    <----
(PID.TID 0004.0001)                         cg2d_res =     
5.70277451392061E-06  <----
(PID.TID 0004.0001)                    cg3d_init_res =     
1.12211403181414E+00
(PID.TID 0004.0001)                       cg3d_iters =   427
(PID.TID 0004.0001)                         cg3d_res =     
9.61570391359023E-10

The 24-processor run eventually blows up, apparently due to cg2d  
convergence failure.

My understanding was that the numerical solution should be  
independent of the underlying processor layout.  Is this correct?  If  
not, what do I need to change to get the 24-processor run working?   
If so, what the heck is going on?