[Aces-support] Odd mpi errors relating to MNC package
Daniel Enderton
enderton at MIT.EDU
Thu Dec 2 11:34:47 EST 2004
I started three jobs last night around 1am within a range of about 20
minutes of each other. They all came back with mpi errors (in the
pbs error files) relating to netcdf and mnc that read something like:
ABNORMAL END: package MNC
forrtl: severe (28): CLOSE error, unit 60, file "Unknown"
Image PC Routine Line Source
mitgcmuv.O1 081F75F8 Unknown Unknown Unknown
Stack trace terminated abnormally.
p4_error: latest msg from perror: Bad file descriptor
In the pbs standard out file, the problematic part looked like this:
NetCDF ERROR: No such file or directory
MNC ERROR: ending define mode in S/R MNC_FILE_ENDDEF
p4_31316: p4_error: net_recv read: probable EOF on socket: 1
p5_27711: p4_error: net_recv read: probable EOF on socket: 1
p7_16293: p4_error: net_recv read: probable EOF on socket: 1
p3_647: p4_error: net_recv read: probable EOF on socket: 1
p2_12796: p4_error: net_recv read: probable EOF on socket: 1
rm_l_1_1848: p4_error: listener select: -1
p6_21651: p4_error: net_recv read: probable EOF on socket: 1
P4 procgroup file is pr_group.
All the STDERR files are there but of zero size. The STDOUT files
have nothing in them of note at the end (just the usual sea ice
monitor statistic for one of the packages that I am using).
Something else odd; they all seemed to break down at almost the exact
same time (even though I did not start then all within this close of
a time):
[enderton at itrda enderton]$ ls -l AquaC3O10/AqC3O10_C.*
-rw------- 1 enderton aces 10409734 Dec 2 04:04 AquaC3O10/AqC3O10_C.e51362
-rw------- 1 enderton aces 55986184 Dec 2 04:04 AquaC3O10/AqC3O10_C.o51362
[enderton at itrda enderton]$ ls -l AquaC3O5/AqC3O5_C.*
-rw------- 1 enderton aces 10104455 Dec 2 04:03 AquaC3O5/AqC3O5_C.e51361
-rw------- 1 enderton aces 54339284 Dec 2 04:03 AquaC3O5/AqC3O5_C.o51361
[enderton at itrda enderton]$ ls -l AquaC3O20/AqC3O20_C.*
-rw------- 1 enderton aces 9799175 Dec 2 04:04 AquaC3O20/AqC3O20_C.e51363
-rw------- 1 enderton aces 52693228 Dec 2 04:04 AquaC3O20/AqC3O20_C.o51363
I double checked all my paths in my pbs scripts to make sure that
runs were not calling trying to call the same executables of
generally crossing paths, but everything looked fine.
Any ideas?
The running directories are:
/net/itrda/scratch-4/enderton/AquaC3O5
/net/itrda/scratch-4/enderton/AquaC3O10
/net/itrda/scratch-4/enderton/AquaC3O20
Cheers,
Daniel
More information about the Aces-support
mailing list