[GE users] master configuration - timeout when exec host freezes

reuti reuti at staff.uni-marburg.de
Tue Nov 10 10:24:11 GMT 2009


Am 10.11.2009 um 09:55 schrieb madpower:

> Hi folks,
> we recently reinstalled our SGE (6.2) using the packages delivered  
> with the (k)ubuntu repositories. Now, we encounter the following  
> problem:
> Whenever an execution host freezes (terminates, looses network  
> connection, etc.) the qhost command correctly shows a load of "--".  
> Nevertheless, the jobs executed on this host at time of termination  
> are still shown as running - and cannot be deleted or rescheduled.
> Our last installation (SGE 6.2) was done by hand using the sources  
> from the website. There, the behavior was that the jobs of  
> execution hosts marked with load "--" where automatically rescheduled.
> So, now we wonder, if there is a setting/parameter for the master  
> which sets the timeout until the master decides to reschedule these  
> specific jobs on machines marked with load "--".

please check the entries "reschedule_unknown" and "max_unheard" in  
`man sge_conf`.

-- Reuti

> Thanks in advance for any advices,
> Matthias
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=225968
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list