[Aces-support] -bash: qstat : command not found

Simon Mcclusky simon at mit.edu
Thu Aug 4 09:47:51 EDT 2005


Peter,

There seems to be something strange about the way ao compute nodes are
behaving. I can issue qsub commands that are accepted from the geojr
compute nodes, but not from the ao compute nodes. Until this is resolved
you can run your scripts on the geojr cluster.

Greg - the problem on ao is the compute nodes see strange info for the
torque /bin directory where qsub resides:

?---------  ? ?    ?       ?            ? bin

I have no clue what this means!

-Simon  
On Thu, 2005-08-04 at 03:38, Peter H Israelsson wrote:
> I am having a similar but slightly different problem with the qsub command, as
> detailed below.
> 
> First I need to explain how my jobs are run: Because my simulations are longer
> than the max walltime of the queues available to me, I have rewritten my code
> to automatically run as a sequence of smaller simulations.  The code
> automatically stops itself when its run time is approaching the max walltime,
> and signals the calling pbs script that the simulation needs to be restarted
> from its current time level.  Before exiting, the calling pbs script creates a
> new pbs script file and submits a new job, i.e., the last command it issues
> before exiting is "qsub [new_job]".  That way, the next part of the simulation
> is assigned a new job number, and the walltime is reset.
> 
> This process was working fine before the ao system went down last night, i.e.,
> the code stopped and restarted itself automatically with no problems.  
> However,
> since the system went down last night, this sequential process is now failing
> because it says that it cannot find the qsub command:
> /var/torque/mom_priv/jobs/8322.ao.SC: line 65: qsub: command not found
> I have tested this a number of times and get the same result each time.
> 
> So something is different on ao since yesterday's reboot.  The strange 
> thing is
> that when I manually log on to ao.acesgrid.org, I do have access to qsub,
> qstat, etc.  So I don't understand why the 'qsub' command doesn't work when
> issued by an existing job.
> 
> Any ideas what is going on?  Thanks.
> 
> Regards,
> Peter
> 
> PS Greg, I am confused by your last email because the module 'magick' 
> you refer
> to is not listed when I type 'module avail'.  Aren't the qsub, qstat, etc
> commands automatically loaded (in one of the 'default' modules such as
> 'torque/1.2.0p4')?  Also, I get an error when I try typing 'module load
> magick', saying that the module cannot be found.
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>   Peter H. Israelsson
>   Massachusetts Institute of Technology
>   Department of Civil & Environmental Engineering
>   48-114, 15 Vassar Street, Cambridge, MA 02139, USA
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> 
> Quoting aces-admin at techsquare.com:
> 
> > hello beghein-
> >
> > are you certain that your login bits are setup
> > to load the module magick ? i just tested this
> > and it worked for me...
> >
> > . ssh ts at ao.acesgrid.org
> > . ao: module list
> > . ao: qstat
> >
> > [greg]
> >
> >> Mime-Version: 1.0
> >> Date: Wed, 3 Aug 2005 16:32:03 -0400
> >> From: Caroline Beghein <beghein at mit.edu>
> >> Cc:
> >> Reply-To: ACES-support at mitgcm.org
> >>
> >> Hi
> >>
> >> Is there still something wrong with the cluster? Whether I login to
> >> ao or geojr, I cannot start any job. If I type qsub ... or qstat I
> >> get "-bash: qstat : command not found"
> >> What does that mean?
> >>
> >> Thanks
> >>
> >>
> >> --
> >> 	Caroline
> >>
> >>
> >>
> >> Caroline Beghein
> >> 77 Massachusetts avenue #54-526
> >> Cambridge, MA 02139
> >> tel.: +1 617 253 3589
> >> http://www.mit.edu/~beghein
> >> _______________________________________________
> >> Aces-support mailing list
> >> Aces-support at acesgrid.org
> >> http://acesgrid.org/mailman/listinfo/aces-support
> >>
> >
> > _______________________________________________
> > Aces-support mailing list
> > Aces-support at acesgrid.org
> > http://acesgrid.org/mailman/listinfo/aces-support
> >
> 
> 
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support
-- 
Simon McClusky
RM 54-614, Dept EAPS, MIT,
77 Massachusetts Ave,
Cambridge, MA 02139
USA

email: simon at mit.edu
Ph: 617 253-3077
Fax: 617 253-1699
Cell:857 928-5891






More information about the Aces-support mailing list