[GE users] Restarting sge_execd does not clear hung job status

reuti reuti at staff.uni-marburg.de
Thu Oct 7 10:45:27 BST 2010


Hi,

Am 07.10.2010 um 00:08 schrieb coffman:

> I recently moved from 6.0u8 to 6.2u5 and am noticing a different behavior that I could use some help with.  On the previous version of grid we would occasionally have a grid system hang in such a way that it would need to be rebooted.   When this happened the job info related to the job would be cleared from the scheduler.    
> 
> Version 6.2u5 does not behave the same way.    The system running a particular job has been rebooted, so the job is definitly no longer running.    When the system comes back up, sge_execd is started on the exechost.    A qstat still shows the job as running on the host that was rebooted.    Any clues as to why it does not get cleaned up?

is the (local) spool directory of the node removed when the node is rebooted?

-- Reuti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=286404

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list