[GE users] qdel by user broken with tight integration

reuti reuti at staff.uni-marburg.de
Sat Nov 22 18:29:36 GMT 2008


Am 20.11.2008 um 18:42 schrieb Scott Beardsley:

> I have tight integration with OpenMPI 1.2.6 + GE 6.1u4 working nicely.
> There is one problem that has been bugging me. When a node dies  
> (out of
> mem, kernel panic, hardware, etc) the job hangs around until the user
> qdel's it. Then it enters the dr state and eventually must be  
> removed by
> root via "qdel -f". Is there any way to have the job removed
> automatically and/or by the user? Also, is there any way to notify the
> user when the node dies (of course I can always do this out of band)?

you can look into the qmaster_params "ENABLE_RESCHEDULE_SLAVE" and  
"ENABLE_RESCHEDULE_KILL" value (man sge_config). It's available in  
6.2, I don't know when it appeared.

-- Reuti

> Scott
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=89249
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list