[Aces-support] -bash: qstat : command not found

Peter H Israelsson peteri at MIT.EDU
Thu Aug 4 03:38:28 EDT 2005


I am having a similar but slightly different problem with the qsub command, as
detailed below.

First I need to explain how my jobs are run: Because my simulations are longer
than the max walltime of the queues available to me, I have rewritten my code
to automatically run as a sequence of smaller simulations.  The code
automatically stops itself when its run time is approaching the max walltime,
and signals the calling pbs script that the simulation needs to be restarted
from its current time level.  Before exiting, the calling pbs script creates a
new pbs script file and submits a new job, i.e., the last command it issues
before exiting is "qsub [new_job]".  That way, the next part of the simulation
is assigned a new job number, and the walltime is reset.

This process was working fine before the ao system went down last night, i.e.,
the code stopped and restarted itself automatically with no problems.  
However,
since the system went down last night, this sequential process is now failing
because it says that it cannot find the qsub command:
/var/torque/mom_priv/jobs/8322.ao.SC: line 65: qsub: command not found
I have tested this a number of times and get the same result each time.

So something is different on ao since yesterday's reboot.  The strange 
thing is
that when I manually log on to ao.acesgrid.org, I do have access to qsub,
qstat, etc.  So I don't understand why the 'qsub' command doesn't work when
issued by an existing job.

Any ideas what is going on?  Thanks.

Regards,
Peter

PS Greg, I am confused by your last email because the module 'magick' 
you refer
to is not listed when I type 'module avail'.  Aren't the qsub, qstat, etc
commands automatically loaded (in one of the 'default' modules such as
'torque/1.2.0p4')?  Also, I get an error when I try typing 'module load
magick', saying that the module cannot be found.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Peter H. Israelsson
  Massachusetts Institute of Technology
  Department of Civil & Environmental Engineering
  48-114, 15 Vassar Street, Cambridge, MA 02139, USA
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Quoting aces-admin at techsquare.com:

> hello beghein-
>
> are you certain that your login bits are setup
> to load the module magick ? i just tested this
> and it worked for me...
>
> . ssh ts at ao.acesgrid.org
> . ao: module list
> . ao: qstat
>
> [greg]
>
>> Mime-Version: 1.0
>> Date: Wed, 3 Aug 2005 16:32:03 -0400
>> From: Caroline Beghein <beghein at mit.edu>
>> Cc:
>> Reply-To: ACES-support at mitgcm.org
>>
>> Hi
>>
>> Is there still something wrong with the cluster? Whether I login to
>> ao or geojr, I cannot start any job. If I type qsub ... or qstat I
>> get "-bash: qstat : command not found"
>> What does that mean?
>>
>> Thanks
>>
>>
>> --
>> 	Caroline
>>
>>
>>
>> Caroline Beghein
>> 77 Massachusetts avenue #54-526
>> Cambridge, MA 02139
>> tel.: +1 617 253 3589
>> http://www.mit.edu/~beghein
>> _______________________________________________
>> Aces-support mailing list
>> Aces-support at acesgrid.org
>> http://acesgrid.org/mailman/listinfo/aces-support
>>
>
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support
>





More information about the Aces-support mailing list