[Aces-support] job exit unexpected on geo

Richard Lu lurr at MIT.EDU
Tue Apr 3 16:12:00 EDT 2007


Thanks for the information.
However, the problem still persist but giving out a different
error message.
If you go to /home/lurr/scratch/s10/erlsmp/focaldepth_camodel
and submit that job:

[lurr at geo:~/scratch/s10/erlsmp/focaldepth_camodel]
$ qsub q-lam.csh
The job will quit immediately and the error message is showing as this:
[lurr at geo:~/scratch/s10/erlsmp/focaldepth_camodel]
$ cat fd3d_lam.stderr
/home/lurr/scratch/s10/erlsmp/focaldepth_camodel: No such file or directory.

What is even more strange is that, I cannot even submit the same job in ao.
So I submitted the job and it give me a jobid, however, I can not see the jobid
if I do a qstat -a. Do you know what's going on? Thanks.
[lurr at ao:~/scratch/s10/erlsmp/focaldepth_camodel]
$ qsub q-lam.csh
59223.ao
[lurr at ao:~/scratch/s10/erlsmp/focaldepth_camodel]
$ qstat -a

ao:
                                                                    Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
58896.ao             usagi    long     bend_freq   19903     1   1    --  24:00 R 23:47
59164.ao             yohai    long     jcm1r         --      8   1    --  24:00 R 16:00
59172.ao             lurr     long     STDIN       17218     1   1    --  24:00 R 15:59
59188.ao             utke     long     STDIN       17256     1   1    --  24:00 R 05:27
59195.ao             timw     long     STDIN       32661     1   1    --  24:00 R 02:41
59214.ao             billboos four-twe nowish        --     24   1    --  12:00 R 00:10


aces-admin at techsquare.com wrote:
> hello lurr-
> 
> i have made some changes to the
> exports file at geo that should 
> help out here. 
> 
> the short story is that the compute-
> node you were assigned had so many 
> network connexions that it was forced
> to use a high-port to request the nfs-
> mount of your home directory. by default,
> such requests are refused and your job
> would fail out straight-away.
> 
> i have enabled such requests to be accepted
> from the compute-nodes. 
> 
> [greg]
> 
> 
>> Date: Mon, 02 Apr 2007 20:43:23 -0400
>> From: Rongrong Lu <lurr at mit.edu>
>> MIME-Version: 1.0
>> Cc: 
>> Reply-To: ACES-support at mitgcm.org
>>
>> I had met the same problem today as what Yang met.
>> The job quit right after I submit it, the error message is the following:
>> -bash: line 1: /var/torque/mom_priv/jobs/3605.geo.SC: No such file or 
>> directory
>> The same job was submitted successfully yesterday without any problem.
>>
>>
>> Rongrong Lu
>>
>> --------------------------------------------
>> Earth Resources Laboratory, MIT
>> 42 Carleton St. E34-370, Cambridge, MA 02142
>> Tel:     617-253-7835 (o)  617-230-6729 (m)
>> Email:   lurr at mit.edu
>> Web:     http://web.mit.edu/lurr
>> --------------------------------------------
>>
>> Yang Zhang wrote:
>>> Hi,
>>>
>>> I have a job exited by no reason just after I submitted on geo.  It 
>>> worked well if I ran it in the interactive mode on geo.  The jobid of my 
>>> job is: 3604.geo.  Can you help check this for me?  This problem 
>>> happened couple of weeks ago, and it seems still there.
>>>
>>> Thanks,
>>> Yang
>>> _______________________________________________
>>> Aces-support mailing list
>>> Aces-support at acesgrid.org
>>> http://acesgrid.org/mailman/listinfo/aces-support
>> _______________________________________________
>> Aces-support mailing list
>> Aces-support at acesgrid.org
>> http://acesgrid.org/mailman/listinfo/aces-support
>>
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support



More information about the Aces-support mailing list