[Aces-support] PBS question
Lorenzo Campo
lcampo at MIT.EDU
Thu Aug 4 12:31:20 EDT 2005
Hi,
this is not a question about acesgrid cluster, but any suggestion will be highly
appreciated...
I'm trying to make a cluster of 16 processors in my Department (University of
Florence, Italy), I installed everything with the OSCAR Package, that installed
every node and all useful daemons without problems. Problem is that PBS reports
(with pbsnodes -a command) that every node is "state-unknonw,down", apart the
master node that is "free", with no apparent reason. All communications between
nodes and master are not blocked (iptables is just down, and I performed
several communication tests), PBS files like "nodes" and "server" contains
right ip and hostnames, pbs_mom daemon is regularly running on each node and on
the master, and MAUI doesn't seem to have problem (or I guess so...). I created
two queues with qmgr, both are enabled and started, I defined all 15 nodes in
defining the server (again in qmgr), but every time I try to launch a job with
qsub that employs more than one processor, it is put in queue indefinetely, and
a qstat -s says that "there are not enough resources". Any idea?
Thank you very much and sorry for the out-of-topic question.
Lorenzo
More information about the Aces-support
mailing list