[Aces-support] pickups
Matthew Mazloff
mmazloff at MIT.EDU
Mon Dec 13 09:50:36 EST 2004
Hello,
I'm having problems trying to restart a model from either .nc or .data
pickups.
From .nc files the program just freezes. No stderr or stdout. I
eventually have to qdel it.
From .data I get the stderr message:
cp: cannot create regular file `./mitgcmuv': Text file busy
ln: `./SOUTHERN_OCEAN_1x1': File exists
Skipping namelist "OBCS_PARM02": seeking namelist "OBCS_PARM03".
Skipping namelist "OBCS_PARM02": seeking namelist "OBCS_PARM03".
Skipping namelist "OBCS_PARM02": seeking namelist "OBCS_PARM03".
Skipping namelist "OBCS_PARM02": seeking namelist "OBCS_PARM03".
Skipping namelist "OBCS_PARM02": seeking namelist "OBCS_PARM03".
Skipping namelist "OBCS_PARM02": seeking namelist "OBCS_PARM03".
do_ud: end of file
apparent state: unit 9 named pickup.0000065880.data
lately reading direct unformatted external IO
/usr/local/pkg/mpich/mpich-gcc/bin/mpirun: line 1: 3051
Aborted (core dumped)
/net/itrda/scratch-2/mmazloff/southern_ocean/exemp6/./mitgcmuv -p4pg
/net/itrda/scratch-2/mmazloff/sou
thern_ocean/exemp6/PI2883 -p4wd
/net/itrda/scratch-2/mmazloff/southern_ocean/exemp6
and my stdout file reads:
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
/var/torque-1.0.1p6/aux/52041.itrda
a54-1727-071
aE34-500-057
aE34-500-079
aE34-500-054
aE34-500-052
aE34-500-051
Started program at : Mon Dec 13 09:38:42 EST 2004
p4_18706: p4_error: net_recv read: probable EOF on socket: 1
p1_19442: p4_error: net_recv read: probable EOF on socket: 1
p2_19307: p4_error: net_recv read: probable EOF on socket: 1
p3_19187: p4_error: net_recv read: probable EOF on socket: 1
p5_19558: p4_error: net_recv read: probable EOF on socket: 1
bm_list_3052: (37.315062) wakeup_slave: unable to interrupt slave 0 pid 3051
bm_list_3052: (37.315269) wakeup_slave: unable to interrupt slave 0 pid 3051
bm_list_3052: (37.315391) wakeup_slave: unable to interrupt slave 0 pid 3051
bm_list_3052: (37.315513) wakeup_slave: unable to interrupt slave 0 pid 3051
bm_list_3052: (37.315624) wakeup_slave: unable to interrupt slave 0 pid 3051
Ended program at : Mon Dec 13 09:39:20 EST 2004
I am probably doing something naive once again. But I am not sure what,
as I feel the model is reading in the pickup file fine. I am guessing
the line
/usr/local/pkg/mpich/mpich-gcc/bin/mpirun: line 1: 3051 Aborted
is key to understanding why it crashes/freezes.
Any help would be greatly appreciated,
Thanks,
Matt
More information about the Aces-support
mailing list