[GE users] sge_execd fails to start after crash and reboot

Filipe Brandenburger filipe.brandenburger at idilia.com
Tue Jul 1 15:12:58 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Christian,

Christian Reissmann wrote:
> the info about the logging:
> 
> critical error: can't get configuration qmaster - terminating
> 
> was really helpful. It is only used in one code part (of execd)
> 
> The function sge_setup_sge_execd() does a terminate (SGE_EXIT(1))
> of execd if he cannot get his configuration at startup after 3 tries.
> 
> The function get_conf_and_daemonize() was modified in 60u7. Perhaps
> this solves the problem for you.
> 
> The main difference is that the timeouts for getting the configuration
> are different and the allowed_get_conf_errors have increased.

Thanks a lot.

For now, I reduced the time outs (max_unheard) from 5 minutes to 2
minutes and since then I didn't have the problem again (I worked around
it from the other end).

In August we are planning a big upgrade to probably SGE 6.1 latest
update, I will set the timeouts to their original values again and see
if the problem persists.

Thanks a lot for your help!
Filipe

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list