[GE users] Jobs still shown as running after process has died

reuti reuti at staff.uni-marburg.de
Thu Aug 12 22:13:23 BST 2010


Hi,

Am 12.08.2010 um 17:59 schrieb robhorton:

> I've noticed a few cases recently where jobs appear in qstat as running,
> although the actual process on the execution host has died. I know this
> happens when a host is in an unknown state, but it is currently
> happening on a host which is (apparently) healthy and running another
> job normally. The jobs normally disappear when the execd is restarted.
> I'm not too worried about the jobs dying per se, but it would be nice if
> their execution slot could be cleared without manual intervention.
> 
> Any thoughts?

was this a serial or parallel job? Parallel jobs are known to have a delay after they finished.

What do you mean by "the actual process ... has died"? It hangs or disappeared from the process list (hence the shepherd is hanging around there alone)?

-- Reuti


> Rob
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274018
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274091

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list