[GE users] Best way to shut down an SGE cluster

hawson beckerjes at mail.nih.gov
Tue Feb 9 21:54:46 GMT 2010


On Tue, Feb 09, 2010 at 04:38:57PM -0500, templedf wrote:
>The fast way to shutdown a cluster is "qconf -kej all -km".  That stops 
>the qmaster and all execds cleanly and aborts all running jobs.  The 
>jobs will need to be resubmitted.
>
>If you have time for an extra step or two, you can "qmod -d \*; qmod -rj 
>\*; qconf -kej all -km".  That will disable all queues and then requeue 
>all jobs, before stopping all the execds and master.  When you bring the 
>cluster back up, run "qmod -e \*" to bring your queues back online, and 
>the jobs should automatically be rescheduled.  This all requires that 
>your queues have rerun set to TRUE, though.

Does the qmod -rj reschedule the jobs as the user who originally submitted
them, or as the user running the qmod command?  I've had problems with
this in the past, and have since resorted to various script hackery (as
root), to dump users' jobs and resubmit them with various (ab)uses of su
and qresub.



-- 
Jesse Becker
NHGRI Linux support (Digicon Contractor)

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=244154

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list