[GE users] sge_execd fails to start after crash and reboot

Filipe Brandenburger filipe.brandenburger at idilia.com
Thu Jun 19 18:38:23 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti wrote:
>> My question here is: Is there a way that I can tune this to avoid this
>> situation? Can I change the timeouts/hearbeats to make the master see
>> that a machine is unreachable more quickly?
> 
> you can change "max_unheard" in SGE's configuration (qconf -mconf), but
> it shouldn't be smaller than "load_report_time".

OK! I reduced it from 5 minutes to 2 minutes. I guess that should be
enough time for the master to see the node is down before it is back again.

I also reduced load_report_time from 40s to 15s, just in case.

I'll let you know if the problem happens again (unlikely).

Thanks!
Filipe

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list