[Aces-support] job exit unexpected on geo

aces-admin at techsquare.com aces-admin at techsquare.com
Tue Apr 3 16:34:33 EDT 2007


hello lurr-

please re-test && confirm.

[greg]


> Date: Tue, 03 Apr 2007 16:12:00 -0400
> From: Richard Lu <lurr at mit.edu>
> MIME-Version: 1.0
> Cc: 
> Reply-To: ACES-support at mitgcm.org
> 
> Thanks for the information.
> However, the problem still persist but giving out a different
> error message.
> If you go to /home/lurr/scratch/s10/erlsmp/focaldepth_camodel
> and submit that job:
> 
> [lurr at geo:~/scratch/s10/erlsmp/focaldepth_camodel]
> $ qsub q-lam.csh
> The job will quit immediately and the error message is showing as this:
> [lurr at geo:~/scratch/s10/erlsmp/focaldepth_camodel]
> $ cat fd3d_lam.stderr
> /home/lurr/scratch/s10/erlsmp/focaldepth_camodel: No such file or directory.
> 
> What is even more strange is that, I cannot even submit the same job in ao.
> So I submitted the job and it give me a jobid, however, I can not see the jobid
> if I do a qstat -a. Do you know what's going on? Thanks.
> [lurr at ao:~/scratch/s10/erlsmp/focaldepth_camodel]
> $ qsub q-lam.csh
> 59223.ao
> [lurr at ao:~/scratch/s10/erlsmp/focaldepth_camodel]
> $ qstat -a
> 
> ao:
>                                                                     Req'd  Req'd   Elap
> Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
> -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
> 58896.ao             usagi    long     bend_freq   19903     1   1    --  24:00 R 23:47
> 59164.ao             yohai    long     jcm1r         --      8   1    --  24:00 R 16:00
> 59172.ao             lurr     long     STDIN       17218     1   1    --  24:00 R 15:59
> 59188.ao             utke     long     STDIN       17256     1   1    --  24:00 R 05:27
> 59195.ao             timw     long     STDIN       32661     1   1    --  24:00 R 02:41
> 59214.ao             billboos four-twe nowish        --     24   1    --  12:00 R 00:10
> 
> 
> aces-admin at techsquare.com wrote:
> > hello lurr-
> > 
> > i have made some changes to the
> > exports file at geo that should 
> > help out here. 
> > 
> > the short story is that the compute-
> > node you were assigned had so many 
> > network connexions that it was forced
> > to use a high-port to request the nfs-
> > mount of your home directory. by default,
> > such requests are refused and your job
> > would fail out straight-away.
> > 
> > i have enabled such requests to be accepted
> > from the compute-nodes. 
> > 
> > [greg]
> > 
> > 
> >> Date: Mon, 02 Apr 2007 20:43:23 -0400
> >> From: Rongrong Lu <lurr at mit.edu>
> >> MIME-Version: 1.0
> >> Cc: 
> >> Reply-To: ACES-support at mitgcm.org
> >>
> >> I had met the same problem today as what Yang met.
> >> The job quit right after I submit it, the error message is the following:
> >> -bash: line 1: /var/torque/mom_priv/jobs/3605.geo.SC: No such file or 
> >> directory
> >> The same job was submitted successfully yesterday without any problem.
> >>
> >>
> >> Rongrong Lu
> >>
> >> --------------------------------------------
> >> Earth Resources Laboratory, MIT
> >> 42 Carleton St. E34-370, Cambridge, MA 02142
> >> Tel:     617-253-7835 (o)  617-230-6729 (m)
> >> Email:   lurr at mit.edu
> >> Web:     http://web.mit.edu/lurr
> >> --------------------------------------------
> >>
> >> Yang Zhang wrote:
> >>> Hi,
> >>>
> >>> I have a job exited by no reason just after I submitted on geo.  It 
> >>> worked well if I ran it in the interactive mode on geo.  The jobid of my 
> >>> job is: 3604.geo.  Can you help check this for me?  This problem 
> >>> happened couple of weeks ago, and it seems still there.
> >>>
> >>> Thanks,
> >>> Yang
> >>> _______________________________________________
> >>> Aces-support mailing list
> >>> Aces-support at acesgrid.org
> >>> http://acesgrid.org/mailman/listinfo/aces-support
> >> _______________________________________________
> >> Aces-support mailing list
> >> Aces-support at acesgrid.org
> >> http://acesgrid.org/mailman/listinfo/aces-support
> >>
> > _______________________________________________
> > Aces-support mailing list
> > Aces-support at acesgrid.org
> > http://acesgrid.org/mailman/listinfo/aces-support
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support
> 



More information about the Aces-support mailing list