[MITgcm-support] Problems running the MITgcm in parallel

Yuan Lian lian at U.Arizona.EDU
Fri Oct 1 07:03:38 EDT 2004


Hi, Kevin and Ed,

Actually I encountered the same problem with extreme values during
parallel computation on Beowulf. However it depends on how to define the
number of processes in x and y direction. I set nPx=1, nPy=N where N is
the number of several processes (I set four for my simulation), the code
will run without any problem. But if I set more than one process in x
direction, the code will give extreme values. I thought this might be
caused by bad communication between sub-grids in x direction.

I haven't seen any problem during compilation. It is important to make
sure both compiler and mpich are base on same structure as 64bit or ia32.
Error may occur if they are not consistant. What I did is to write
another script under folder "MITgcm/tools/build_options", just make sure
the compiler links to the right version of mpich libraries.

-Yuan

On Thu, 30 Sep 2004, Ed Hill wrote:

> On Thu, 2004-09-30 at 17:55, Kevin Oliver wrote:
> > Hello,
> >
> > I'm running MITgcm on a beowulf cluster and Linux platform.
> > There are two problems, which may be related, which mean I
> > can't get it to work in parallel.
>
> Hi Kevin,
>
> We use MITgcm with numerous compilers (Gnu, Intel, PGI) and MPI
> libraries on a routine basis.  So its somewhat unlikely that you've
> found a bug in MITgcm.  Whats more likely is that you have a problem
> with your MPI and/or compiler setup.
>
> So my first question is: are you certain that your mpich-gm install and
> your compiler (PGI) are working correctly together?  That is, can you
> create simple "hello world"-type programs that compile, link, and run in
> parallel?  This should be the first thing to test.
>
>
> > 1. When I use the genmake2 command, the option -mpi is not
> > recognised. Using the -optfile option, I can link to a
> > libraries for mpi compilation, and it appears to compile
> > successfully using mpich-gm and pgi (portland group
> > compilers). However...
>
> Could you please send us the *exact* syntax that you used to invoke
> genmake2 and the *exact* (please cut-and-paste) error message(s) along
> with the MITgcm version that you're using?
>
>
> > 2. When I run the code in parallel, after (I think) 1 time step, the
> > execution is stopped due to extreme values (there are lots of nans,
> > although not only nans). I get the message "Possibly you have
> > different setenv PARALLEL and nThreads?" The equivalent run
> > works fine sequentially. I've read about parallel threads, but
> > haven't managed to work out where to set these.
>
> How are you running the code?  Are you certain that you've used the
> correct "mpirun ..." or equivalent syntax?
>
> Again, I think it would be best for you to first confirm that you have
> the compiler and MPI libraries working by running a simple test
> program.  When you're sure that those parts are functioning, then move
> on to the MITgcm model.
>
> Ed
>
> --
> Edward H. Hill III, PhD
> office:  MIT Dept. of EAPS;  Rm 54-1424;  77 Massachusetts Ave.
>              Cambridge, MA 02139-4307
> emails:  eh3 at mit.edu                ed at eh3.com
> URLs:    http://web.mit.edu/eh3/    http://eh3.com/
> phone:   617-253-0098
> fax:     617-253-4464
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://dev.mitgcm.org/mailman/listinfo/mitgcm-support
>



More information about the MITgcm-support mailing list