[MITgcm-support] problem with LAM mpi + xlf

Tue Dec 6 23:22:08 EST 2005

Dimitri

I suggest switching from LAM to Open MPI (its successor) or mpich2.
I too was having trouble with LAM (with another software library) on  
a dual G5
and was told that LAM is no longer being developed. I decided to  
switch to mpich.
Everything has been fine since. Also, there is someone here who uses  
Open MPI
on a G5 cluster and is very happy with it.

Samar

On Dec 6, 2005, at 6:22 PM, Dmitri Leonov wrote:

> Hello all,
>
> Again about darwinism and mpi:
>
> I'm trying to use a dual a G5 (2 cpu's on 1 node) with LAM 7.1.1  
> and IBM xlf 8.1 (XServ cluster under MacOS 10.3).
> (This configuration is being used for running other models: POM, ROMS)
>
> The model either crashes (suddenly starts to output 'NaN' values)  
> or reports an I/O error like this
>
> cg2d: Sum(rhs),rhsMax =   1.17979101234927E+02   
> 5.263219991MPI_Recv: message truncated: Input/output error (rank 1,  
> MPI_COMM_WORLD)
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD):  - MPI_Recv()
> Rank (1, MPI_COMM_WORLD):  - main()
>
> With the non-mpi version and the same input, neither of the above  
> happens.
>
> Also, both with and without MPI, output shows "usingMPI =     
> F"  (don't know if that's normal).
> Right now I'm using checkpoint57x_post.
>
> In general, how sensitive is the model supposed to be to the number  
> of CPU's?
>
> One of the examples is a modified version of 'exp1' verification  
> experiment. The options/input can be found at
> http://orchard.ocean.washington.edu/dleonov/exp1mod.tgz
> (120 kb)
>
> Hopefully I'm doing something wrong.
>
> Regards,
> Dmitri
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support