[MITgcm-devel] Re: broken testreport
Ed Hill
eh3 at mit.edu
Wed Dec 1 17:14:08 EST 2004
On Wed, 2004-12-01 at 13:14 -0800, Dimitris Menemenlis wrote:
> Ed, I ran the same tests with checkpoint56, as Patrick suggested, and
> get exactly same results, i.e., the "-mpi" option fails on all four machines
> and single-process testreport works on orion and columbia, the Altices,
> and fails on gemini and lomax, the Origins. To see whether this failure
> had something to do with wrong automatic choice of option files, I tried:
>
> ./testreport -mpi -of=../tools/build_options/linux_ia64_efc+mpi_altix -t global_with_exf
>
> on orion and columbia. This also failed with following complaint:
>
> running ... mpirun must be used to launch all MPI applications
> make: *** [output.txt] Error 255
> failed
>
> submitting the job using "mpirun -np 2 mitgcmuv" worked OK,
> but there was still problem with junk I/O files.
>
> adding usesinglecpuio=.true. to PARM01 in the data file fixed
> the I/O problem
>
> so this would explain the "#$%@ I/O" problem that I reported
> yesterday.
>
> It would be great if you were able to fine-tune testreport so that
> it works automatically on the Ames and the JPL supercomputers.
Hi Dimitris & Patrick,
I'm sending this to the devel list since I think everyone doing MPI work
should see this discussion.
First, I'm sorry that testreport doesn't work 100% automatically on all
MPI systems. The problem is that the $MPIRUN or equivalent command is
different on every system. There are a variety of different $MPIRUN
executables and they all take slightly different arguments, etc. And
these $MPIRUN things are further complicated by their interactions with
whatever queueing system is used. Many systems (eg. UCAR/NCAR) will not
allow you to run any MPI programs--even small ones--outside of their
queueing system. So the MPI stuff *MUST* be submitted through whatever
queue system is in effect.
And this means that you have to get all of the picky queue system
details right including (but not limited to!):
- $QRUN syntax (PBS vs LoadLeveler vs Condor vs ...)
- paths to scratch disks
- shell parameters
- "module" conventions
- etc-etc-etc...
and these things all vary wildly from one system to the next and often
need per-user customizations. The upshot is that I don't expect to
write a 100% fully automated MPI version of testreport anytime soon.
So heres my proposed solution: we all check in example scripts (esp.
examples that show how to use testreport) in the following location:
MITgcm/tools/examples/${SYSTEM_NAME}/${SCRIPT_NAME}
and then we try to keep them current. I just checked in scripts for
cg01, ITRDA, and the IBM SPs at UCAR/NCAR.
Or do you folks have any better ideas?
Ed
--
Edward H. Hill III, PhD
office: MIT Dept. of EAPS; Rm 54-1424; 77 Massachusetts Ave.
Cambridge, MA 02139-4307
emails: eh3 at mit.edu ed at eh3.com
URLs: http://web.mit.edu/eh3/ http://eh3.com/
phone: 617-253-0098
fax: 617-253-4464
More information about the MITgcm-devel
mailing list