[MITgcm-support] MPI problem with MITgcm
Sergey Vinogradov
svinogra at aer.com
Tue Jun 15 17:43:00 EDT 2010
Hi there;
After our IBM ppc64 cluster has been upgraded, I'm having trouble
running MITgcm in parallel configuration.
1-processor jobs run fine.
But if I request more than 1 proc, the only message I get is
pX_XXXXX: p4_error: interrupt SIGSEGV: 11
There is no additional debug info generated.
The MPI job seems to get submitted into a queue to proper nodes but it
never gets past that error.
Following the IBM recommendations, I changed P4_SOCKBUFSIZE and
P4_GLOBMEMSIZE environment variables with no effect.
The upgraded items are rhel 4 -> rhel 5.2, xlf 9.1 -> xlf 11, xlc 7 ->
xlc 9.
The schedulers are maui and torque.
The mpich 1.2.7.p1 remains the same after upgrade.
There are no indications of system problems, as other MPI codes do
compile and run flawlessly.
At this point I exhausted all possible leads, and I decided to ask
around in hopes that someone had similar issues with MITgcm.
I admit that I use an old MITgcm code, but it has been running fine
before the upgrade.
I would really appreciate any ideas on this problem!
Thank you,
Sergey Vinogradov, Ph.D., Staff Scientist
Atmospheric and Environmental Research, Inc.
131 Hartwell Ave., Lexington, MA 02421, USA
Phone: 1-781-761-2256 sergey at aer.com
Fax: 1-781-761-2299 http://www.aer.com
Web page :: http://ocean.mit.edu/~svinogra
More information about the MITgcm-support
mailing list