[Aces-support] PBS question
Lorenzo Campo
lcampo at MIT.EDU
Mon Aug 8 17:10:53 EDT 2005
hello,
I downloaded torque 1.2.0p4, I removed everything of the previous version of PBS
from three nodes (including the master) and I installed torque following
exactly the procedure described in official documentation. Everything compiled
with no problems, I started PBS_server, PBS_mom on master and on the two nodes
(I didn't yet started the scheduler), but I still have this problem of "nodes
down". This time, master itself (node 0) is marked as "state-unknonw,down"
together with the node 2, while node 1 is "free". I checked, and there is no
differences in installation between nodes 1 and 2 (so why 1 works fine and 2
doesn't?!). I tried to reboot the master and THEN reboot nodes, being sure that
all pbs_mom services were running, but nothing changed. Moreover, I can't stop
pbs_server (I have to reboot the master) because the command "qterm -t quick"
(or -t immediate) just blocks (I have to exit with Ctrl+C). Files "config",
"nodes" and "server" are ok, installation correctly configure the server and
the queue batch. So, where is the problem this time?
Thank you
Lorenzo
>hello lcampo-
>
>you should consider running torque (and not openpbs).
>fwiw, http://www.clusterresources.com (or www.supercluster.org).
>
>if you really are running OpenPBS, then i suggest switching
>to the latest torque as your first step. we are running
>torque-1.2.0p4 at the aces cluster.
>
>[greg]
[Hide Quoted Text]
> Date: Fri, 5 Aug 2005 03:28:26 -0400
> From: Lorenzo Campo <lcampo at mit.edu>
> MIME-Version: 1.0
> Cc:
> Reply-To: ACES-support at mitgcm.org
>
> It should be OpenPBS 2.3 (I think...), don't know the version of MAUI, how I
> could check for the version?
>
> Quoting aces-admin at techsquare.com:
>
> > hello lcampo-
> >
> > torque is a quickly evolving beast.
> > which version are you running ?
> >
> > [greg]
> >
> >> Date: Thu, 4 Aug 2005 12:31:20 -0400
> >> From: Lorenzo Campo <lcampo at mit.edu>
> >> MIME-Version: 1.0
> >> Cc:
> >> Reply-To: ACES-support at mitgcm.org
> >>
> >> Hi,
> >> this is not a question about acesgrid cluster, but any suggestion
> >> will be highly
> >> appreciated...
> >> I'm trying to make a cluster of 16 processors in my Department
> >> (University of
> >> Florence, Italy), I installed everything with the OSCAR Package,
> >> that installed
> >> every node and all useful daemons without problems. Problem is that
> >> PBS reports
> >> (with pbsnodes -a command) that every node is "state-unknonw,down",
> >> apart the
> >> master node that is "free", with no apparent reason. All
> >> communications between
> >> nodes and master are not blocked (iptables is just down, and I performed
> >> several communication tests), PBS files like "nodes" and "server" contains
> >> right ip and hostnames, pbs_mom daemon is regularly running on each
> >> node and on
> >> the master, and MAUI doesn't seem to have problem (or I guess
> >> so...). I created
> >> two queues with qmgr, both are enabled and started, I defined all 15
> >> nodes in
> >> defining the server (again in qmgr), but every time I try to launch
> >> a job with
> >> qsub that employs more than one processor, it is put in queue
> >> indefinetely, and
> >> a qstat -s says that "there are not enough resources". Any idea?
> >> Thank you very much and sorry for the out-of-topic question.
> >> Lorenzo
> >> _______________________________________________
> >> Aces-support mailing list
> >> Aces-support at acesgrid.org
> >> http://acesgrid.org/mailman/listinfo/aces-support
> >>
> > _______________________________________________
> > Aces-support mailing list
> > Aces-support at acesgrid.org
> > http://acesgrid.org/mailman/listinfo/aces-support
> >
>
>
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support
>
_______________________________________________
Aces-support mailing list
Aces-support at acesgrid.org
http://acesgrid.org/mailman/listinfo/aces-support
More information about the Aces-support
mailing list