[MITgcm-support] MPI problem with MITgcm

Sergey Vinogradov svinogra at aer.com
Tue Jun 15 17:43:00 EDT 2010


Hi there;

After our IBM ppc64 cluster has been upgraded, I'm having trouble  
running MITgcm in parallel configuration.
1-processor jobs run fine.
But if I request more than 1 proc, the only message I get is
pX_XXXXX: p4_error: interrupt SIGSEGV: 11
There is no additional debug info generated.

The MPI job seems to get submitted into a queue to proper nodes but it  
never gets past that error.

Following the IBM recommendations, I changed P4_SOCKBUFSIZE and  
P4_GLOBMEMSIZE environment variables with no effect.

The upgraded items are rhel 4 -> rhel 5.2, xlf 9.1 -> xlf 11, xlc 7 ->  
xlc 9.
The schedulers are maui and torque.
The mpich 1.2.7.p1 remains the same after upgrade.

There are no indications of system problems, as other MPI codes do  
compile and run flawlessly.

At this point I exhausted all possible leads, and I decided to ask  
around in hopes that someone had similar issues with MITgcm.

I admit that I use an old MITgcm code, but it has been running fine  
before the upgrade.

I would really appreciate any ideas on this problem!

Thank you,

Sergey Vinogradov, Ph.D.,           Staff Scientist
Atmospheric and Environmental Research, Inc.
131 Hartwell Ave., Lexington, MA 02421,  USA
Phone: 1-781-761-2256          sergey at aer.com
Fax:      1-781-761-2299       http://www.aer.com
Web page  ::        http://ocean.mit.edu/~svinogra







More information about the MITgcm-support mailing list