[MITgcm-support] problem with LAM mpi + xlf
Samar Khatiwala
spk at ldeo.columbia.edu
Tue Dec 6 23:22:08 EST 2005
Dimitri
I suggest switching from LAM to Open MPI (its successor) or mpich2.
I too was having trouble with LAM (with another software library) on
a dual G5
and was told that LAM is no longer being developed. I decided to
switch to mpich.
Everything has been fine since. Also, there is someone here who uses
Open MPI
on a G5 cluster and is very happy with it.
Samar
On Dec 6, 2005, at 6:22 PM, Dmitri Leonov wrote:
> Hello all,
>
> Again about darwinism and mpi:
>
> I'm trying to use a dual a G5 (2 cpu's on 1 node) with LAM 7.1.1
> and IBM xlf 8.1 (XServ cluster under MacOS 10.3).
> (This configuration is being used for running other models: POM, ROMS)
>
> The model either crashes (suddenly starts to output 'NaN' values)
> or reports an I/O error like this
>
> cg2d: Sum(rhs),rhsMax = 1.17979101234927E+02
> 5.263219991MPI_Recv: message truncated: Input/output error (rank 1,
> MPI_COMM_WORLD)
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): - MPI_Recv()
> Rank (1, MPI_COMM_WORLD): - main()
>
> With the non-mpi version and the same input, neither of the above
> happens.
>
> Also, both with and without MPI, output shows "usingMPI =
> F" (don't know if that's normal).
> Right now I'm using checkpoint57x_post.
>
> In general, how sensitive is the model supposed to be to the number
> of CPU's?
>
> One of the examples is a modified version of 'exp1' verification
> experiment. The options/input can be found at
> http://orchard.ocean.washington.edu/dleonov/exp1mod.tgz
> (120 kb)
>
> Hopefully I'm doing something wrong.
>
> Regards,
> Dmitri
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
More information about the MITgcm-support
mailing list