[Aces-support] -bash: qstat : command not found

aces-admin at techsquare.com aces-admin at techsquare.com
Thu Aug 4 09:57:10 EDT 2005


hello peteri-

i am looking at this presently. 

[greg]

> Date: Thu,  4 Aug 2005 03:38:28 -0400
> From: Peter H Israelsson <peteri at mit.edu>
> MIME-Version: 1.0
> Cc: 
> Reply-To: ACES-support at mitgcm.org
> 
> I am having a similar but slightly different problem with the qsub command, as
> detailed below.
> 
> First I need to explain how my jobs are run: Because my simulations are longer
> than the max walltime of the queues available to me, I have rewritten my code
> to automatically run as a sequence of smaller simulations.  The code
> automatically stops itself when its run time is approaching the max walltime,
> and signals the calling pbs script that the simulation needs to be restarted
> from its current time level.  Before exiting, the calling pbs script creates a
> new pbs script file and submits a new job, i.e., the last command it issues
> before exiting is "qsub [new_job]".  That way, the next part of the simulation
> is assigned a new job number, and the walltime is reset.
> 
> This process was working fine before the ao system went down last night, i.e.,
> the code stopped and restarted itself automatically with no problems.  
> However,
> since the system went down last night, this sequential process is now failing
> because it says that it cannot find the qsub command:
> /var/torque/mom_priv/jobs/8322.ao.SC: line 65: qsub: command not found
> I have tested this a number of times and get the same result each time.
> 
> So something is different on ao since yesterday's reboot.  The strange 
> thing is
> that when I manually log on to ao.acesgrid.org, I do have access to qsub,
> qstat, etc.  So I don't understand why the 'qsub' command doesn't work when
> issued by an existing job.
> 
> Any ideas what is going on?  Thanks.
> 
> Regards,
> Peter
> 
> PS Greg, I am confused by your last email because the module 'magick' 
> you refer
> to is not listed when I type 'module avail'.  Aren't the qsub, qstat, etc
> commands automatically loaded (in one of the 'default' modules such as
> 'torque/1.2.0p4')?  Also, I get an error when I try typing 'module load
> magick', saying that the module cannot be found.
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>   Peter H. Israelsson
>   Massachusetts Institute of Technology
>   Department of Civil & Environmental Engineering
>   48-114, 15 Vassar Street, Cambridge, MA 02139, USA
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> 
> Quoting aces-admin at techsquare.com:
> 
> > hello beghein-
> >
> > are you certain that your login bits are setup
> > to load the module magick ? i just tested this
> > and it worked for me...
> >
> > . ssh ts at ao.acesgrid.org
> > . ao: module list
> > . ao: qstat
> >
> > [greg]
> >
> >> Mime-Version: 1.0
> >> Date: Wed, 3 Aug 2005 16:32:03 -0400
> >> From: Caroline Beghein <beghein at mit.edu>
> >> Cc:
> >> Reply-To: ACES-support at mitgcm.org
> >>
> >> Hi
> >>
> >> Is there still something wrong with the cluster? Whether I login to
> >> ao or geojr, I cannot start any job. If I type qsub ... or qstat I
> >> get "-bash: qstat : command not found"
> >> What does that mean?
> >>
> >> Thanks
> >>
> >>
> >> --
> >> 	Caroline
> >>
> >>
> >>
> >> Caroline Beghein
> >> 77 Massachusetts avenue #54-526
> >> Cambridge, MA 02139
> >> tel.: +1 617 253 3589
> >> http://www.mit.edu/~beghein
> >> _______________________________________________
> >> Aces-support mailing list
> >> Aces-support at acesgrid.org
> >> http://acesgrid.org/mailman/listinfo/aces-support
> >>
> >
> > _______________________________________________
> > Aces-support mailing list
> > Aces-support at acesgrid.org
> > http://acesgrid.org/mailman/listinfo/aces-support
> >
> 
> 
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support
> 



More information about the Aces-support mailing list