[Aces-support] job exit unexpected on geo

Richard Lu lurr at MIT.EDU
Tue Apr 3 17:13:18 EDT 2007


It is working on geo now.
Thanks.

cheers,

Rongrong Lu

----------------------------------------------
Earth Resources Laboratory, MIT
42 Carleton St. E34-370, Cambridge, MA 02142
Tel: 617-253-7835 (office) 617-230-6729 (cell)
Email: lurr at mit.edu
Web: http://web.mit.edu/lurr
----------------------------------------------


aces-admin at techsquare.com wrote:
> hello lurr-
> 
> please re-test && confirm.
> 
> [greg]
> 
> 
>> Date: Tue, 03 Apr 2007 16:12:00 -0400
>> From: Richard Lu <lurr at mit.edu>
>> MIME-Version: 1.0
>> Cc: 
>> Reply-To: ACES-support at mitgcm.org
>>
>> Thanks for the information.
>> However, the problem still persist but giving out a different
>> error message.
>> If you go to /home/lurr/scratch/s10/erlsmp/focaldepth_camodel
>> and submit that job:
>>
>> [lurr at geo:~/scratch/s10/erlsmp/focaldepth_camodel]
>> $ qsub q-lam.csh
>> The job will quit immediately and the error message is showing as this:
>> [lurr at geo:~/scratch/s10/erlsmp/focaldepth_camodel]
>> $ cat fd3d_lam.stderr
>> /home/lurr/scratch/s10/erlsmp/focaldepth_camodel: No such file or directory.
>>
>> What is even more strange is that, I cannot even submit the same job in ao.
>> So I submitted the job and it give me a jobid, however, I can not see the jobid
>> if I do a qstat -a. Do you know what's going on? Thanks.
>> [lurr at ao:~/scratch/s10/erlsmp/focaldepth_camodel]
>> $ qsub q-lam.csh
>> 59223.ao
>> [lurr at ao:~/scratch/s10/erlsmp/focaldepth_camodel]
>> $ qstat -a
>>
>> ao:
>>                                                                     Req'd  Req'd   Elap
>> Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
>> -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
>> 58896.ao             usagi    long     bend_freq   19903     1   1    --  24:00 R 23:47
>> 59164.ao             yohai    long     jcm1r         --      8   1    --  24:00 R 16:00
>> 59172.ao             lurr     long     STDIN       17218     1   1    --  24:00 R 15:59
>> 59188.ao             utke     long     STDIN       17256     1   1    --  24:00 R 05:27
>> 59195.ao             timw     long     STDIN       32661     1   1    --  24:00 R 02:41
>> 59214.ao             billboos four-twe nowish        --     24   1    --  12:00 R 00:10
>>
>>
>> aces-admin at techsquare.com wrote:
>>> hello lurr-
>>>
>>> i have made some changes to the
>>> exports file at geo that should 
>>> help out here. 
>>>
>>> the short story is that the compute-
>>> node you were assigned had so many 
>>> network connexions that it was forced
>>> to use a high-port to request the nfs-
>>> mount of your home directory. by default,
>>> such requests are refused and your job
>>> would fail out straight-away.
>>>
>>> i have enabled such requests to be accepted
>>> from the compute-nodes. 
>>>
>>> [greg]
>>>
>>>
>>>> Date: Mon, 02 Apr 2007 20:43:23 -0400
>>>> From: Rongrong Lu <lurr at mit.edu>
>>>> MIME-Version: 1.0
>>>> Cc: 
>>>> Reply-To: ACES-support at mitgcm.org
>>>>
>>>> I had met the same problem today as what Yang met.
>>>> The job quit right after I submit it, the error message is the following:
>>>> -bash: line 1: /var/torque/mom_priv/jobs/3605.geo.SC: No such file or 
>>>> directory
>>>> The same job was submitted successfully yesterday without any problem.
>>>>
>>>>
>>>> Rongrong Lu
>>>>
>>>> --------------------------------------------
>>>> Earth Resources Laboratory, MIT
>>>> 42 Carleton St. E34-370, Cambridge, MA 02142
>>>> Tel:     617-253-7835 (o)  617-230-6729 (m)
>>>> Email:   lurr at mit.edu
>>>> Web:     http://web.mit.edu/lurr
>>>> --------------------------------------------
>>>>
>>>> Yang Zhang wrote:
>>>>> Hi,
>>>>>
>>>>> I have a job exited by no reason just after I submitted on geo.  It 
>>>>> worked well if I ran it in the interactive mode on geo.  The jobid of my 
>>>>> job is: 3604.geo.  Can you help check this for me?  This problem 
>>>>> happened couple of weeks ago, and it seems still there.
>>>>>
>>>>> Thanks,
>>>>> Yang
>>>>> _______________________________________________
>>>>> Aces-support mailing list
>>>>> Aces-support at acesgrid.org
>>>>> http://acesgrid.org/mailman/listinfo/aces-support
>>>> _______________________________________________
>>>> Aces-support mailing list
>>>> Aces-support at acesgrid.org
>>>> http://acesgrid.org/mailman/listinfo/aces-support
>>>>
>>> _______________________________________________
>>> Aces-support mailing list
>>> Aces-support at acesgrid.org
>>> http://acesgrid.org/mailman/listinfo/aces-support
>> _______________________________________________
>> Aces-support mailing list
>> Aces-support at acesgrid.org
>> http://acesgrid.org/mailman/listinfo/aces-support
>>
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support



More information about the Aces-support mailing list