[MITgcm-support] Problems with mpi on beowulf

Ed Hill ed at eh3.com
Fri Sep 17 23:58:50 EDT 2004


On Fri, 2004-09-17 at 23:16, Yuan Lian wrote:
> Hi,
> 
> I encountered a problem when I ran MITgcm with mpich on a beowulf cluster.
> Here is the error message from output:
> 
> >Warning: no access to tty (Bad file descriptor).
> >Thus no job control in this shell.
> 
> This seems like I didn't choose the proper shell. I was running the code
> under csh which was defined in script file as "#PBS -S /bin/csh".
> 
> >(PID.TID 0000.0001) *** ERROR *** S/R EESET_PARMS
> >(PID.TID 0000.0001) *** ERROR *** Unable to open execution environment
> >(PID.TID 0000.0001) *** ERROR *** parameter file "eedata"

Hi Yuan,

It sounds like something is the matter with your MPI setup.  Are you
sure that its working correctly?  Have you tried any simple MPI example
programs?  And have you verified that you're using the correct options
for the "mpirun" or equivalent command used to start MPI programs?

Getting MPI setup and working can be a bit confusing so its good to try
a few simple examples to try it.  The mpich implementation includes some
example codes for testing.


> This error message shows that the code can't read eedata, (the default
> folds was changed to my home directory instead of workdir!!)I defined
> "usingMPI = .TRUE." and moved all programs under workdir to home
> directory, then code will run, however the results of simulation are
> obviously wrong comparing to single cpu. Here are the questions:
> 
> 1. How to define the variables in eedate properly?

The "eedata" examples in MITgcm/verification/* all work on a variety of
different platforms including Linux Beowulf clusters.  While its
possible that some (eg. F90/F95) compilers may not like the namelist
syntax, the existing syntax shouldn't be a problem with g77 or pgf77.


> 2. Do I still need joinmds to incorporate the ouputs when I use mpirun to
>    run the code?(it seems that all the outputs has been incorporated)

If you haven't specified any of the "single CPU IO" options with the old
MDSIO package, then you will get per-tile output.


> 3. Does anybody successfully run the code on Beowulf cluster?(the default
>    compiler is g77 or pgf77 with mpich?

MITgcm is run on a daily basis on a number of different Beowulf-style
clusters including both Intel Xeon and AMD Opteron hardware.  So its
certainly capable!

It sounds like you have an mpich configuration problem.  Some notes on
mpich installation, the use of MITgcm with mpich, and example
configuration files are available at:

  http://mitgcm.org/sealion/online_documents/node93.html
  http://mitgcm.org/pipermail/mitgcm-devel/2004-March/000470.html
  http://mitgcm.org/testing/results/2004_09/tr_myrinet-3-30_20040916_0/
  http://mitgcm.org/cgi-bin/viewcvs.cgi/MITgcm/tools/build_options/
    ==> which includes numerous examples of "optfiles" for 
        systems using mpich

Good luck with your Beowulf cluster!

Ed

-- 
Edward H. Hill III, PhD
office:  MIT Dept. of EAPS;  Rm 54-1424;  77 Massachusetts Ave.
             Cambridge, MA 02139-4307
emails:  eh3 at mit.edu                ed at eh3.com
URLs:    http://web.mit.edu/eh3/    http://eh3.com/
phone:   617-253-0098
fax:     617-253-4464




More information about the MITgcm-support mailing list