[GE users] master configuration - timeout when exec host freezes

madpower prandtstetter at ads.tuwien.ac.at
Wed Nov 11 10:02:34 GMT 2009


> please check the entries "reschedule_unknown" and "max_unheard" in  
> `man sge_conf`.
thanks for this indication. The reschedule_unknown parameter works as expected/wished but the max_unheard is somehow disregarded.
In fact, I could observe the following behavior:
*) if max_unheard is set to a smaller value the load_report_time then after about 20 minutes having this setting the master recognizes that it does not have information on the state of some execution hosts, which is updated as soon as the next load report is sent.
*) if max_unheard is set to a value larger than load_report_time it takes approx. 20-30 minutes until the master recognizes that an execution host is unavailable.

Does anyone have an idea what's going wrong here? Or did anyone already experienced a similar behavior?



To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list