[Aces-support] PBS question
aces-admin at techsquare.com
aces-admin at techsquare.com
Mon Aug 8 17:18:13 EDT 2005
hello lcampo-
rather than bore all of the aces-users with
problems on your cluster, can we take this
off-list ?
fwiw, the easiest thing would be to login
and take a look. there are lots of little things
that one doesn't think of when typing email.
can't make any promises, though, as this is
all gratis, non ?
[greg]
> Date: Mon, 8 Aug 2005 17:10:53 -0400
> From: Lorenzo Campo <lcampo at mit.edu>
> MIME-Version: 1.0
> Cc:
> Reply-To: ACES-support at mitgcm.org
>
>
> hello,
> I downloaded torque 1.2.0p4, I removed everything of the previous version of PBS
> from three nodes (including the master) and I installed torque following
> exactly the procedure described in official documentation. Everything compiled
> with no problems, I started PBS_server, PBS_mom on master and on the two nodes
> (I didn't yet started the scheduler), but I still have this problem of "nodes
> down". This time, master itself (node 0) is marked as "state-unknonw,down"
> together with the node 2, while node 1 is "free". I checked, and there is no
> differences in installation between nodes 1 and 2 (so why 1 works fine and 2
> doesn't?!). I tried to reboot the master and THEN reboot nodes, being sure that
> all pbs_mom services were running, but nothing changed. Moreover, I can't stop
> pbs_server (I have to reboot the master) because the command "qterm -t quick"
> (or -t immediate) just blocks (I have to exit with Ctrl+C). Files "config",
> "nodes" and "server" are ok, installation correctly configure the server and
> the queue batch. So, where is the problem this time?
> Thank you
> Lorenzo
>
>
>
>
>
>
> >hello lcampo-
> >
> >you should consider running torque (and not openpbs).
> >fwiw, http://www.clusterresources.com (or www.supercluster.org).
> >
> >if you really are running OpenPBS, then i suggest switching
> >to the latest torque as your first step. we are running
> >torque-1.2.0p4 at the aces cluster.
> >
> >[greg]
>
>
>
> [Hide Quoted Text]
> > Date: Fri, 5 Aug 2005 03:28:26 -0400
> > From: Lorenzo Campo <lcampo at mit.edu>
> > MIME-Version: 1.0
> > Cc:
> > Reply-To: ACES-support at mitgcm.org
> >
> > It should be OpenPBS 2.3 (I think...), don't know the version of MAUI, how I
> > could check for the version?
> >
> > Quoting aces-admin at techsquare.com:
> >
> > > hello lcampo-
> > >
> > > torque is a quickly evolving beast.
> > > which version are you running ?
> > >
> > > [greg]
> > >
> > >> Date: Thu, 4 Aug 2005 12:31:20 -0400
> > >> From: Lorenzo Campo <lcampo at mit.edu>
> > >> MIME-Version: 1.0
> > >> Cc:
> > >> Reply-To: ACES-support at mitgcm.org
> > >>
> > >> Hi,
> > >> this is not a question about acesgrid cluster, but any suggestion
> > >> will be highly
> > >> appreciated...
> > >> I'm trying to make a cluster of 16 processors in my Department
> > >> (University of
> > >> Florence, Italy), I installed everything with the OSCAR Package,
> > >> that installed
> > >> every node and all useful daemons without problems. Problem is that
> > >> PBS reports
> > >> (with pbsnodes -a command) that every node is "state-unknonw,down",
> > >> apart the
> > >> master node that is "free", with no apparent reason. All
> > >> communications between
> > >> nodes and master are not blocked (iptables is just down, and I performed
> > >> several communication tests), PBS files like "nodes" and "server" contains
> > >> right ip and hostnames, pbs_mom daemon is regularly running on each
> > >> node and on
> > >> the master, and MAUI doesn't seem to have problem (or I guess
> > >> so...). I created
> > >> two queues with qmgr, both are enabled and started, I defined all 15
> > >> nodes in
> > >> defining the server (again in qmgr), but every time I try to launch
> > >> a job with
> > >> qsub that employs more than one processor, it is put in queue
> > >> indefinetely, and
> > >> a qstat -s says that "there are not enough resources". Any idea?
> > >> Thank you very much and sorry for the out-of-topic question.
> > >> Lorenzo
> > >> _______________________________________________
> > >> Aces-support mailing list
> > >> Aces-support at acesgrid.org
> > >> http://acesgrid.org/mailman/listinfo/aces-support
> > >>
> > > _______________________________________________
> > > Aces-support mailing list
> > > Aces-support at acesgrid.org
> > > http://acesgrid.org/mailman/listinfo/aces-support
> > >
> >
> >
> > _______________________________________________
> > Aces-support mailing list
> > Aces-support at acesgrid.org
> > http://acesgrid.org/mailman/listinfo/aces-support
> >
>
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support
>
More information about the Aces-support
mailing list