[GE users] Execd not starting up

Sean Davis sdavis2 at mail.nih.gov
Wed Oct 15 21:01:29 BST 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

On Wed, Oct 15, 2008 at 2:20 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
> Hi,
>
> Am 15.10.2008 um 20:03 schrieb Sean Davis:
>
>> I have a node that was rebooted and now does not start execd.  The
>> cluster is running SGE6.2 and is openSUSE-linux-based.  With some
>> debugging output turned on, I get:
>>
>> /import/cluster/sge/bin/lx24-amd64/sge_execd
>> cl_com_read_alias_file() [cl_communication.c/2510] main
>> => host alias file is not existing
>
> are you using a host_aliases file in $SGE_ROOT/default/common? Is this
> location shared between the nodes and the master?

Thanks, Reuti.  The answers are "no" and "yes" respectively.

Sean

>>
>> cl_commlib_get_endpoint_status() [cl_commlib.c/6132] main
>>  => waiting for SIRM with id 1
>>
>> cl_commlib_get_endpoint_status() [cl_commlib.c/6201] main
>>  => no SRIM for SIM with id 1
>>
>> cl_commlib_get_endpoint_status() [cl_commlib.c/6201] main
>>  => no SRIM for SIM with id 1
>>
>> cl_commlib_get_endpoint_status() [cl_commlib.c/6201] main
>>  => no SRIM for SIM with id 1
>>
>> cl_commlib_get_endpoint_status() [cl_commlib.c/6183] main
>>  => got SIRM for SIM with id: 1
>>
>> cl_com_tcp_open_connection_request_handler() [cl_tcp_framework.c/1771]
>> execd_read          => select interrupted (errno=EINTR)
>>
>> cl_com_handle_read_thread() [cl_commlib.c/6935] execd_read          =>
>> got select interrupt
>>
>> cl_com_tcp_open_connection_request_handler() [cl_tcp_framework.c/1771]
>> execd_read          => select interrupted (errno=EINTR)
>>
>> cl_com_handle_read_thread() [cl_commlib.c/6935] execd_read          =>
>> got select interrupt
>>
>> cl_com_tcp_open_connection_request_handler() [cl_tcp_framework.c/1771]
>> execd_read          => select interrupted (errno=EINTR)
>>
>> Ad infinitum, pretty much.
>>
>> Any suggestions on where to look next?  There is no firewall on the
>> machine and the hostname looks up correctly.  The qmaster is running
>> and responding for all the other machines in the cluster.
>>
>> Thanks,
>> Sean
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list