[MITgcm-support] cg convergence vs processor count

Wed Jul 27 12:54:49 EDT 2005

On Wed, 2005-07-27 at 11:36 -0400, Jason Goodman wrote:
> > Alas the non-associative nature of floating point arithmetic  
> > ensures that the
> > numerical solution is actually dependent on the order reductions  
> > (such as the
> > residual calculation in CG) are evaluated. However it is rather  
> > unusual for
> > these differences to lead to such huge differences in solver behaviour
> > (non-convergence vs. convergence). This is rather worrying and may be
> > indicative of some other underlying problem with your setup. Large
> > differences in summation results with differing reduction order  
> > tend to occur
> > when values in significantly different exponent ranges are added  
> > together.
> 
> Your point about floating-point errors is a good one, but I agree  
> that the very big difference is odd.
> 
> I've tried running the exp5 verification (rotating convection from  
> widespread surface buoyancy loss)  with varying numbers of  
> processors, and don't have this problem, so I suspect there's a  
> problem with my experimental setup rather than my hardware. I also  
> notice that convergence seems to be slower for the point-source  
> problem than a broad source with a similar model domain.
> 
> I'm doing a point-source convection problem with a rather dense grid  
> (200x200 horizontal, 133 vertical); the surface buoyancy forcing is  
> isolated at a single gridpoint.  Could the fact that most of the  
> domain is "boring", in that initially only one point in a million has  
> anything going on, cause problems with solving the pressure field?
> 
> If so, is there any way to apply a preconditioner or some sort of  
> weighting to encourage the CG algorithm to focus its effort on the  
> buoyancy source, where the action is?  My long-term goal is to use a  
> grid with narrow spacing near the source and wider spacing farther  
> away, but I'm having blowup problems with that so I'm trying to get  
> the evenly-spaced grid working first.

Hi Jason,

For elliptic problems, conjugate-gradient methods usually do well in the
neighborhood of a discrete source since they tend to quickly correct
shorter-wave-length errors.  What can take a long time is (perhaps
surprisingly) the propagation of longer-wave-length errors, particularly
those that approach the size of the problem domain.  Thus, its entirely
possible that your "large boring area" is where the problem is slowly
converging.  For a more rigorous discussion of this topic, please see
many multi-grid references which can be found at, for instance:

   http://www.mgnet.org/mgnet-books-wesseling.html

Chris Hill explained to me that some previous MITgcm versions had a per-
tile multi-grid conjugate-gradient pre-conditioner that, while it
(sometimes?) sped up convergence, also created/accentuated slight
discontinuities at the tile edges.  So, that implementation had some
annoying side-effects and is not used.

I suspect that there are better parallel multi-grid/multi-level solvers
that can speed up MITgcm's elliptic problem.  Someone just needs to
develop and test them.  And, I suspect that such an approach could help
in many situations, including yours.  But thats only intuition.

This would be a fun topic for a paper.  Is anyone interested?

Ed

-- 
Edward H. Hill III, PhD
office:  MIT Dept. of EAPS;  Rm 54-1424;  77 Massachusetts Ave.
             Cambridge, MA 02139-4307
emails:  eh3 at mit.edu                ed at eh3.com
URLs:    http://web.mit.edu/eh3/    http://eh3.com/
phone:   617-253-0098
fax:     617-253-4464