[GE users] Best way to shut down an SGE cluster

prentice prentice at ias.edu
Tue Feb 9 22:03:19 GMT 2010


Awesome. That is exactly what I was looking for. Thanks for the quick
and succinct reply.

--
Prentice


Daniel Templeton wrote:
> The fast way to shutdown a cluster is "qconf -kej all -km".  That stops
> the qmaster and all execds cleanly and aborts all running jobs.  The
> jobs will need to be resubmitted.
> 
> If you have time for an extra step or two, you can "qmod -d \*; qmod -rj
> \*; qconf -kej all -km".  That will disable all queues and then requeue
> all jobs, before stopping all the execds and master.  When you bring the
> cluster back up, run "qmod -e \*" to bring your queues back online, and
> the jobs should automatically be rescheduled.  This all requires that
> your queues have rerun set to TRUE, though.
> 
> Also note, the -ke or -kej has to come before the -km.  Nothing after
> the -km will have any effect.
> 
> Daniel
> 
> On 02/09/10 13:20, prentice wrote:
>> I'm sure this has been discussed many times on this list, but I couldn't
>> find a good answer by search (I was probably wasn't using the magical
>> combination of search terms to get my answer):
>>
>> If you need to shut down an entire cluster quickly (power outage, for
>> example) what is the best way to shutdown SGE, so that there's minimal
>> disruption when the cluster starts up again?
>>
>> Will jobs that were running be restarted/requeued,  or will they need to
>> be submitted again? I know that data will be lost if the jobs don't
>> provide their own checkpointing.
>>
>>    
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=244158

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list