[GE users] Restarting sge_execd does not clear hung job status
michael.coffman at avagotech.com
Wed Oct 6 23:08:43 BST 2010
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
I recently moved from 6.0u8 to 6.2u5 and am noticing a different behavior that I could use some help with. On the previous version of grid we would occasionally have a grid system hang in such a way that it would need to be rebooted. When this happened the job info related to the job would be cleared from the scheduler.
Version 6.2u5 does not behave the same way. The system running a particular job has been rebooted, so the job is definitly no longer running. When the system comes back up, sge_execd is started on the exechost. A qstat still shows the job as running on the host that was rebooted. Any clues as to why it does not get cleaned up?
More information about the gridengine-users