[Aces-support] pickups

Matthew Mazloff mmazloff at MIT.EDU
Mon Dec 13 09:50:36 EST 2004


Hello,

I'm having problems trying to restart a model from either .nc or .data 
pickups. 
 From .nc files the program just freezes.  No stderr or stdout.  I 
eventually have to qdel it.

 From .data I get the stderr message:
cp: cannot create regular file `./mitgcmuv': Text file busy
ln: `./SOUTHERN_OCEAN_1x1': File exists
Skipping namelist "OBCS_PARM02": seeking namelist "OBCS_PARM03".
Skipping namelist "OBCS_PARM02": seeking namelist "OBCS_PARM03".
Skipping namelist "OBCS_PARM02": seeking namelist "OBCS_PARM03".
Skipping namelist "OBCS_PARM02": seeking namelist "OBCS_PARM03".
Skipping namelist "OBCS_PARM02": seeking namelist "OBCS_PARM03".
Skipping namelist "OBCS_PARM02": seeking namelist "OBCS_PARM03".
do_ud: end of file
apparent state: unit 9 named pickup.0000065880.data
lately reading direct unformatted external IO
/usr/local/pkg/mpich/mpich-gcc/bin/mpirun: line 1:  3051 
Aborted                 (core dumped) 
/net/itrda/scratch-2/mmazloff/southern_ocean/exemp6/./mitgcmuv -p4pg 
/net/itrda/scratch-2/mmazloff/sou
thern_ocean/exemp6/PI2883 -p4wd 
/net/itrda/scratch-2/mmazloff/southern_ocean/exemp6

and my stdout file reads:
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
/var/torque-1.0.1p6/aux/52041.itrda
a54-1727-071
aE34-500-057
aE34-500-079
aE34-500-054
aE34-500-052
aE34-500-051
Started program at : Mon Dec 13 09:38:42 EST 2004
p4_18706:  p4_error: net_recv read:  probable EOF on socket: 1
p1_19442:  p4_error: net_recv read:  probable EOF on socket: 1
p2_19307:  p4_error: net_recv read:  probable EOF on socket: 1
p3_19187:  p4_error: net_recv read:  probable EOF on socket: 1
p5_19558:  p4_error: net_recv read:  probable EOF on socket: 1
bm_list_3052: (37.315062) wakeup_slave: unable to interrupt slave 0 pid 3051
bm_list_3052: (37.315269) wakeup_slave: unable to interrupt slave 0 pid 3051
bm_list_3052: (37.315391) wakeup_slave: unable to interrupt slave 0 pid 3051
bm_list_3052: (37.315513) wakeup_slave: unable to interrupt slave 0 pid 3051
bm_list_3052: (37.315624) wakeup_slave: unable to interrupt slave 0 pid 3051
Ended program at : Mon Dec 13 09:39:20 EST 2004

I am probably doing something naive once again.  But I am not sure what, 
as I feel the model is reading in the pickup file fine.  I am guessing 
the line
/usr/local/pkg/mpich/mpich-gcc/bin/mpirun: line 1:  3051 Aborted
is key to understanding why it crashes/freezes.
Any help would be greatly appreciated,
Thanks,
Matt




More information about the Aces-support mailing list