[MITgcm-devel] Re: broken testreport

Wed Dec 1 17:14:08 EST 2004

On Wed, 2004-12-01 at 13:14 -0800, Dimitris Menemenlis wrote:
> Ed, I ran the same tests with checkpoint56, as Patrick suggested, and
> get exactly same results, i.e., the "-mpi" option fails on all four machines
> and single-process testreport works on orion and columbia, the Altices,
> and fails on gemini and lomax, the Origins.  To see whether this failure
> had something to do with wrong automatic choice of option files, I tried:
> 
> ./testreport -mpi -of=../tools/build_options/linux_ia64_efc+mpi_altix -t global_with_exf
> 
> on orion and columbia.  This also failed with following complaint:
> 
> running ... mpirun must be used to launch all MPI applications
> make: *** [output.txt] Error 255
> failed
> 
> submitting the job using "mpirun -np 2 mitgcmuv" worked OK,
> but there was still problem with junk I/O files.
> 
> adding usesinglecpuio=.true. to PARM01 in the data file fixed
> the I/O problem
> 
> so this would explain the "#$%@ I/O" problem that I reported
> yesterday.
> 
> It would be great if you were able to fine-tune testreport so that
> it works automatically on the Ames and the JPL supercomputers.

Hi Dimitris & Patrick,

I'm sending this to the devel list since I think everyone doing MPI work
should see this discussion.

First, I'm sorry that testreport doesn't work 100% automatically on all
MPI systems.  The problem is that the $MPIRUN or equivalent command is
different on every system.  There are a variety of different $MPIRUN
executables and they all take slightly different arguments, etc.  And
these $MPIRUN things are further complicated by their interactions with
whatever queueing system is used.  Many systems (eg. UCAR/NCAR) will not
allow you to run any MPI programs--even small ones--outside of their
queueing system.  So the MPI stuff *MUST* be submitted through whatever
queue system is in effect.

And this means that you have to get all of the picky queue system
details right including (but not limited to!):

  - $QRUN syntax (PBS vs LoadLeveler vs Condor vs ...)
  - paths to scratch disks
  - shell parameters
  - "module" conventions
  - etc-etc-etc...

and these things all vary wildly from one system to the next and often
need per-user customizations.  The upshot is that I don't expect to
write a 100% fully automated MPI version of testreport anytime soon.

So heres my proposed solution: we all check in example scripts (esp.
examples that show how to use testreport) in the following location:

  MITgcm/tools/examples/${SYSTEM_NAME}/${SCRIPT_NAME}

and then we try to keep them current.  I just checked in scripts for
cg01, ITRDA, and the IBM SPs at UCAR/NCAR.

Or do you folks have any better ideas?

Ed

-- 
Edward H. Hill III, PhD
office:  MIT Dept. of EAPS;  Rm 54-1424;  77 Massachusetts Ave.
             Cambridge, MA 02139-4307
emails:  eh3 at mit.edu                ed at eh3.com
URLs:    http://web.mit.edu/eh3/    http://eh3.com/
phone:   617-253-0098
fax:     617-253-4464