[MITgcm-support] problem with LAM mpi + xlf

Tue Dec 6 18:22:23 EST 2005

Hello all,

Again about darwinism and mpi:

I'm trying to use a dual a G5 (2 cpu's on 1 node) with LAM 7.1.1 and IBM 
xlf 8.1 (XServ cluster under MacOS 10.3).
(This configuration is being used for running other models: POM, ROMS)

The model either crashes (suddenly starts to output 'NaN' values) or 
reports an I/O error like this

 cg2d: Sum(rhs),rhsMax =   1.17979101234927E+02  5.263219991MPI_Recv: 
message truncated: Input/output error (rank 1, MPI_COMM_WORLD)
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD):  - MPI_Recv()
Rank (1, MPI_COMM_WORLD):  - main()

With the non-mpi version and the same input, neither of the above happens.

Also, both with and without MPI, output shows "usingMPI =    F"  (don't 
know if that's normal).
Right now I'm using checkpoint57x_post.

In general, how sensitive is the model supposed to be to the number of 
CPU's?

One of the examples is a modified version of 'exp1' verification 
experiment. The options/input can be found at
http://orchard.ocean.washington.edu/dleonov/exp1mod.tgz
(120 kb)

Hopefully I'm doing something wrong.

Regards,
Dmitri