[GE users] Best way to shut down an SGE cluster

templedf dan.templeton at sun.com
Tue Feb 9 21:58:43 GMT 2010

qmod -rj doesn't resubmit the jobs.  It just transitions them from the 
running state back into the queued and waiting state.  So, when the jobs 
are rescheduled, they still belong to and run as the submitting users.


On 02/09/10 13:54, hawson wrote:
> On Tue, Feb 09, 2010 at 04:38:57PM -0500, templedf wrote:
>> The fast way to shutdown a cluster is "qconf -kej all -km".  That stops
>> the qmaster and all execds cleanly and aborts all running jobs.  The
>> jobs will need to be resubmitted.
>> If you have time for an extra step or two, you can "qmod -d \*; qmod -rj
>> \*; qconf -kej all -km".  That will disable all queues and then requeue
>> all jobs, before stopping all the execds and master.  When you bring the
>> cluster back up, run "qmod -e \*" to bring your queues back online, and
>> the jobs should automatically be rescheduled.  This all requires that
>> your queues have rerun set to TRUE, though.
> Does the qmod -rj reschedule the jobs as the user who originally submitted
> them, or as the user running the qmod command?  I've had problems with
> this in the past, and have since resorted to various script hackery (as
> root), to dump users' jobs and resubmit them with various (ab)uses of su
> and qresub.


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list