[GE users] Best way to shut down an SGE cluster
dan.templeton at sun.com
Tue Feb 9 21:38:57 GMT 2010
The fast way to shutdown a cluster is "qconf -kej all -km". That stops
the qmaster and all execds cleanly and aborts all running jobs. The
jobs will need to be resubmitted.
If you have time for an extra step or two, you can "qmod -d \*; qmod -rj
\*; qconf -kej all -km". That will disable all queues and then requeue
all jobs, before stopping all the execds and master. When you bring the
cluster back up, run "qmod -e \*" to bring your queues back online, and
the jobs should automatically be rescheduled. This all requires that
your queues have rerun set to TRUE, though.
Also note, the -ke or -kej has to come before the -km. Nothing after
the -km will have any effect.
On 02/09/10 13:20, prentice wrote:
> I'm sure this has been discussed many times on this list, but I couldn't
> find a good answer by search (I was probably wasn't using the magical
> combination of search terms to get my answer):
> If you need to shut down an entire cluster quickly (power outage, for
> example) what is the best way to shutdown SGE, so that there's minimal
> disruption when the cluster starts up again?
> Will jobs that were running be restarted/requeued, or will they need to
> be submitted again? I know that data will be lost if the jobs don't
> provide their own checkpointing.
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users