[Aces-support] job exit unexpected on geo
aces-admin at techsquare.com
aces-admin at techsquare.com
Tue Apr 3 16:34:33 EDT 2007
hello lurr-
please re-test && confirm.
[greg]
> Date: Tue, 03 Apr 2007 16:12:00 -0400
> From: Richard Lu <lurr at mit.edu>
> MIME-Version: 1.0
> Cc:
> Reply-To: ACES-support at mitgcm.org
>
> Thanks for the information.
> However, the problem still persist but giving out a different
> error message.
> If you go to /home/lurr/scratch/s10/erlsmp/focaldepth_camodel
> and submit that job:
>
> [lurr at geo:~/scratch/s10/erlsmp/focaldepth_camodel]
> $ qsub q-lam.csh
> The job will quit immediately and the error message is showing as this:
> [lurr at geo:~/scratch/s10/erlsmp/focaldepth_camodel]
> $ cat fd3d_lam.stderr
> /home/lurr/scratch/s10/erlsmp/focaldepth_camodel: No such file or directory.
>
> What is even more strange is that, I cannot even submit the same job in ao.
> So I submitted the job and it give me a jobid, however, I can not see the jobid
> if I do a qstat -a. Do you know what's going on? Thanks.
> [lurr at ao:~/scratch/s10/erlsmp/focaldepth_camodel]
> $ qsub q-lam.csh
> 59223.ao
> [lurr at ao:~/scratch/s10/erlsmp/focaldepth_camodel]
> $ qstat -a
>
> ao:
> Req'd Req'd Elap
> Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
> -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
> 58896.ao usagi long bend_freq 19903 1 1 -- 24:00 R 23:47
> 59164.ao yohai long jcm1r -- 8 1 -- 24:00 R 16:00
> 59172.ao lurr long STDIN 17218 1 1 -- 24:00 R 15:59
> 59188.ao utke long STDIN 17256 1 1 -- 24:00 R 05:27
> 59195.ao timw long STDIN 32661 1 1 -- 24:00 R 02:41
> 59214.ao billboos four-twe nowish -- 24 1 -- 12:00 R 00:10
>
>
> aces-admin at techsquare.com wrote:
> > hello lurr-
> >
> > i have made some changes to the
> > exports file at geo that should
> > help out here.
> >
> > the short story is that the compute-
> > node you were assigned had so many
> > network connexions that it was forced
> > to use a high-port to request the nfs-
> > mount of your home directory. by default,
> > such requests are refused and your job
> > would fail out straight-away.
> >
> > i have enabled such requests to be accepted
> > from the compute-nodes.
> >
> > [greg]
> >
> >
> >> Date: Mon, 02 Apr 2007 20:43:23 -0400
> >> From: Rongrong Lu <lurr at mit.edu>
> >> MIME-Version: 1.0
> >> Cc:
> >> Reply-To: ACES-support at mitgcm.org
> >>
> >> I had met the same problem today as what Yang met.
> >> The job quit right after I submit it, the error message is the following:
> >> -bash: line 1: /var/torque/mom_priv/jobs/3605.geo.SC: No such file or
> >> directory
> >> The same job was submitted successfully yesterday without any problem.
> >>
> >>
> >> Rongrong Lu
> >>
> >> --------------------------------------------
> >> Earth Resources Laboratory, MIT
> >> 42 Carleton St. E34-370, Cambridge, MA 02142
> >> Tel: 617-253-7835 (o) 617-230-6729 (m)
> >> Email: lurr at mit.edu
> >> Web: http://web.mit.edu/lurr
> >> --------------------------------------------
> >>
> >> Yang Zhang wrote:
> >>> Hi,
> >>>
> >>> I have a job exited by no reason just after I submitted on geo. It
> >>> worked well if I ran it in the interactive mode on geo. The jobid of my
> >>> job is: 3604.geo. Can you help check this for me? This problem
> >>> happened couple of weeks ago, and it seems still there.
> >>>
> >>> Thanks,
> >>> Yang
> >>> _______________________________________________
> >>> Aces-support mailing list
> >>> Aces-support at acesgrid.org
> >>> http://acesgrid.org/mailman/listinfo/aces-support
> >> _______________________________________________
> >> Aces-support mailing list
> >> Aces-support at acesgrid.org
> >> http://acesgrid.org/mailman/listinfo/aces-support
> >>
> > _______________________________________________
> > Aces-support mailing list
> > Aces-support at acesgrid.org
> > http://acesgrid.org/mailman/listinfo/aces-support
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support
>
More information about the Aces-support
mailing list