[Aces-support] PBS question

aces-admin at techsquare.com aces-admin at techsquare.com
Fri Aug 5 09:32:32 EDT 2005


hello lcampo-

you should consider running torque (and not openpbs).
fwiw, http://www.clusterresources.com (or www.supercluster.org). 

if you really are running OpenPBS, then i suggest switching
to the latest torque as your first step. we are running 
torque-1.2.0p4 at the aces cluster. 

[greg]


> Date: Fri,  5 Aug 2005 03:28:26 -0400
> From: Lorenzo Campo <lcampo at mit.edu>
> MIME-Version: 1.0
> Cc: 
> Reply-To: ACES-support at mitgcm.org
> 
> It should be OpenPBS 2.3 (I think...), don't know the version of MAUI, how I
> could check for the version?
> 
> Quoting aces-admin at techsquare.com:
> 
> > hello lcampo-
> >
> > torque is a quickly evolving beast.
> > which version are you running ?
> >
> > [greg]
> >
> >> Date: Thu,  4 Aug 2005 12:31:20 -0400
> >> From: Lorenzo Campo <lcampo at mit.edu>
> >> MIME-Version: 1.0
> >> Cc:
> >> Reply-To: ACES-support at mitgcm.org
> >>
> >> Hi,
> >> this is not a question about acesgrid cluster, but any suggestion 
> >> will be highly
> >> appreciated...
> >> I'm trying to make a cluster of 16 processors in my Department 
> >> (University of
> >> Florence, Italy), I installed everything with the OSCAR Package, 
> >> that installed
> >> every node and all useful daemons without problems. Problem is that 
> >> PBS reports
> >> (with pbsnodes -a command) that every node is "state-unknonw,down", 
> >> apart the
> >> master node that is "free", with no apparent reason. All 
> >> communications between
> >> nodes and master are not blocked (iptables is just down, and I performed
> >> several communication tests), PBS files like "nodes" and "server" contains
> >> right ip and hostnames, pbs_mom daemon is regularly running on each 
> >> node and on
> >> the master, and MAUI doesn't seem to have problem (or I guess 
> >> so...). I created
> >> two queues with qmgr, both are enabled and started, I defined all 15 
> >> nodes in
> >> defining the server (again in qmgr), but every time I try to launch 
> >> a job with
> >> qsub that employs more than one processor, it is put in queue 
> >> indefinetely, and
> >> a qstat -s says that "there are not enough resources". Any idea?
> >> Thank you very much and sorry for the out-of-topic question.
> >> Lorenzo
> >> _______________________________________________
> >> Aces-support mailing list
> >> Aces-support at acesgrid.org
> >> http://acesgrid.org/mailman/listinfo/aces-support
> >>
> > _______________________________________________
> > Aces-support mailing list
> > Aces-support at acesgrid.org
> > http://acesgrid.org/mailman/listinfo/aces-support
> >
> 
> 
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support
> 



More information about the Aces-support mailing list