[GE users] controlling jobs on failed nodes
serge.nosov2 at gmail.com
Thu Aug 13 00:10:06 BST 2009
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
I was wondering if there was a way to make GE terminate/reschedule a job, if a node that this job was running on does not respond for a specified period of time. Currently, the setup that I have with 6.1u5 is that if a node goes down while in the middle of running a job, this job stays in "running" state forever.
More information about the gridengine-users