[Aces-support] p4_error
Lorenzo Campo
lcampo at MIT.EDU
Mon Dec 6 17:15:33 EST 2004
Hi everyone,
I'm trying to perform a run with my model with a domain larger than previous
runs; previous test runs went ok on the cluster and I never experienced
problems, but now, every time I launch the job, in the end of the output file I
find error messages like these:
par_init: 0 8
p0_6699: p4_error: interrupt SIGSEGV: 11
par_init: 7 8
par_init: 6 8
par_init: 5 8
par_init: 4 8
par_init: 3 8
par_init: 2 8
par_init: 1 8
p0_6699: (12.831351) net_send: could not write to fd=3, errno = 32
It happens exactly when the actual physical model starts (after the
inizialization), and now it happens also when I return to the original grid
domain! I didn't change anything in the model, nor recompiled it, nor in the
PBS script (that is very simple), anyone has some idea about it? I searched on
the web for the string "p4_error: interrupt SIGSEGV: 11", and it seems to be a
segmentation fault error in parallel computing, but why it happens now (and not
the first time)? The model was compiled on the cluster with Intel Fortran
Compiler (module intel/mpich). Thanks.
lorenzo
More information about the Aces-support
mailing list