[Aces-support] job exit unexpected on geo
Richard Lu
lurr at MIT.EDU
Tue Apr 3 16:12:00 EDT 2007
Thanks for the information.
However, the problem still persist but giving out a different
error message.
If you go to /home/lurr/scratch/s10/erlsmp/focaldepth_camodel
and submit that job:
[lurr at geo:~/scratch/s10/erlsmp/focaldepth_camodel]
$ qsub q-lam.csh
The job will quit immediately and the error message is showing as this:
[lurr at geo:~/scratch/s10/erlsmp/focaldepth_camodel]
$ cat fd3d_lam.stderr
/home/lurr/scratch/s10/erlsmp/focaldepth_camodel: No such file or directory.
What is even more strange is that, I cannot even submit the same job in ao.
So I submitted the job and it give me a jobid, however, I can not see the jobid
if I do a qstat -a. Do you know what's going on? Thanks.
[lurr at ao:~/scratch/s10/erlsmp/focaldepth_camodel]
$ qsub q-lam.csh
59223.ao
[lurr at ao:~/scratch/s10/erlsmp/focaldepth_camodel]
$ qstat -a
ao:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
58896.ao usagi long bend_freq 19903 1 1 -- 24:00 R 23:47
59164.ao yohai long jcm1r -- 8 1 -- 24:00 R 16:00
59172.ao lurr long STDIN 17218 1 1 -- 24:00 R 15:59
59188.ao utke long STDIN 17256 1 1 -- 24:00 R 05:27
59195.ao timw long STDIN 32661 1 1 -- 24:00 R 02:41
59214.ao billboos four-twe nowish -- 24 1 -- 12:00 R 00:10
aces-admin at techsquare.com wrote:
> hello lurr-
>
> i have made some changes to the
> exports file at geo that should
> help out here.
>
> the short story is that the compute-
> node you were assigned had so many
> network connexions that it was forced
> to use a high-port to request the nfs-
> mount of your home directory. by default,
> such requests are refused and your job
> would fail out straight-away.
>
> i have enabled such requests to be accepted
> from the compute-nodes.
>
> [greg]
>
>
>> Date: Mon, 02 Apr 2007 20:43:23 -0400
>> From: Rongrong Lu <lurr at mit.edu>
>> MIME-Version: 1.0
>> Cc:
>> Reply-To: ACES-support at mitgcm.org
>>
>> I had met the same problem today as what Yang met.
>> The job quit right after I submit it, the error message is the following:
>> -bash: line 1: /var/torque/mom_priv/jobs/3605.geo.SC: No such file or
>> directory
>> The same job was submitted successfully yesterday without any problem.
>>
>>
>> Rongrong Lu
>>
>> --------------------------------------------
>> Earth Resources Laboratory, MIT
>> 42 Carleton St. E34-370, Cambridge, MA 02142
>> Tel: 617-253-7835 (o) 617-230-6729 (m)
>> Email: lurr at mit.edu
>> Web: http://web.mit.edu/lurr
>> --------------------------------------------
>>
>> Yang Zhang wrote:
>>> Hi,
>>>
>>> I have a job exited by no reason just after I submitted on geo. It
>>> worked well if I ran it in the interactive mode on geo. The jobid of my
>>> job is: 3604.geo. Can you help check this for me? This problem
>>> happened couple of weeks ago, and it seems still there.
>>>
>>> Thanks,
>>> Yang
>>> _______________________________________________
>>> Aces-support mailing list
>>> Aces-support at acesgrid.org
>>> http://acesgrid.org/mailman/listinfo/aces-support
>> _______________________________________________
>> Aces-support mailing list
>> Aces-support at acesgrid.org
>> http://acesgrid.org/mailman/listinfo/aces-support
>>
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support
More information about the Aces-support
mailing list