[Aces-support] geo job execution problem

aces-admin at techsquare.com aces-admin at techsquare.com
Tue Apr 3 11:14:11 EDT 2007


hello lurr-

after i removed ae34-500-13[35] from the queue,
all your jobs ran through to completion (and 
many at a time). i will look at both of these 
nodes as soon as i get the chance. at least one
of them is "troubled".

[greg]

> Date: Mon, 02 Apr 2007 16:42:00 -0400
> From: Richard Lu <lurr at mit.edu>
> MIME-Version: 1.0
> Cc: 
> Reply-To: ACES-support at mitgcm.org
> 
> For example, I have just submitted a batch of jobs under "one",
> and there is only 1 job is actually running all others are waiting
> although there is enough nodes available for grabbing.
> YOu can see the problem right now in geo.
> 
> Thanks.
> 
> cheers,
> 
> Rongrong Lu
> 
> ----------------------------------------------
> Earth Resources Laboratory, MIT
> 42 Carleton St. E34-370, Cambridge, MA 02142
> Tel: 617-253-7835 (office) 617-230-6729 (cell)
> Email: lurr at mit.edu
> Web: http://web.mit.edu/lurr
> ----------------------------------------------
> 
> 
> aces-admin at techsquare.com wrote:
> > hello lurr-
> > 
> > looks like you should be able to run 
> > 54 jobs simultaneously in the queue 'one'
> > at geo.
> > 
> >   queue one max_user_run = 56
> > 
> > i don't have detailed enough logs to 
> > find out why you had problems on friday,
> > so please let me know next time this 
> > happens. this level of detail in the logs
> > is only available for a day or so. 
> > 
> > [greg]
> > 
> > 
> >> Date: Fri, 30 Mar 2007 23:20:05 -0400
> >> From: Richard Lu <lurr at mit.edu>
> >> MIME-Version: 1.0
> >> Cc: 
> >> Reply-To: ACES-support at mitgcm.org
> >>
> >> Hi, there,
> >>
> >> I had a problem to execute multiple jobs at the same time.
> >> Before the crash a couple of weeks ago, I was able to run 64 jobs at the 
> >> same time when I use "one" queue type. However, now, I can only run 3 
> >> jobs at one time under the "one" queue type. Anything changed regarding 
> >>   the limitation of simultaneously-running jobs at GEO?
> >> The following is a qstat output to show this problem:
> >> [lurr at geo:~]
> >> $ qstat
> >> Job id              Name             User            Time Use S Queue
> >> ------------------- ---------------- --------------- -------- - -----
> >> 2984.geo            fd3d_lam         lurr            213:47:5 R long 
> >>
> >> 2985.geo            aco2D            lurr            00:08:26 R one 
> >>
> >> 2986.geo            aco2D            lurr            00:07:54 R one 
> >>
> >> 2988.geo            aco2D            lurr            00:07:09 R one 
> >>
> >> 2989.geo            aco2D            lurr                   0 Q one 
> >>
> >> 2990.geo            aco2D            lurr                   0 Q one 
> >>
> >> 2992.geo            aco2D            lurr                   0 Q one 
> >>
> >> 2993.geo            aco2D            lurr                   0 Q one 
> >>
> >> 2995.geo            aco2D            lurr                   0 Q one 
> >>
> >> 2996.geo            aco2D            lurr                   0 Q one 
> >>
> >> 2998.geo            aco2D            lurr                   0 Q one 
> >>
> >> 2999.geo            aco2D            lurr                   0 Q one 
> >>
> >> 3001.geo            aco2D            lurr                   0 Q one 
> >>
> >>
> >> Thanks.
> >>
> >> cheers,
> >>
> >> Rongrong Lu
> >>
> >> _______________________________________________
> >> Aces-support mailing list
> >> Aces-support at acesgrid.org
> >> http://acesgrid.org/mailman/listinfo/aces-support
> >>
> > _______________________________________________
> > Aces-support mailing list
> > Aces-support at acesgrid.org
> > http://acesgrid.org/mailman/listinfo/aces-support
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support
> 



More information about the Aces-support mailing list