[GE users] master configuration - timeout when exec host freezes

madpower prandtstetter at ads.tuwien.ac.at
Tue Nov 10 08:55:33 GMT 2009


Hi folks,

we recently reinstalled our SGE (6.2) using the packages delivered with the (k)ubuntu repositories. Now, we encounter the following problem:
Whenever an execution host freezes (terminates, looses network connection, etc.) the qhost command correctly shows a load of "--". Nevertheless, the jobs executed on this host at time of termination are still shown as running - and cannot be deleted or rescheduled.

Our last installation (SGE 6.2) was done by hand using the sources from the website. There, the behavior was that the jobs of execution hosts marked with load "--" where automatically rescheduled.

So, now we wonder, if there is a setting/parameter for the master which sets the timeout until the master decides to reschedule these specific jobs on machines marked with load "--".

Thanks in advance for any advices,
Matthias

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=225968

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list