[GE users] sge 5.3p4: CONNECTION TIMED OUT

Olesen, Mark Mark.Olesen at arvinmeritor.com
Wed May 19 09:41:31 BST 2004


After a 1/2-year rock solid stable operation, my GridEngine installation
started acting up a few weeks ago.

After an arbitrary period of time (1 to 12 hours), the GridEngine hangs and
the following error message appears in /var/spool/.../messages on the
qmaster host:

Tue May 18 20:04:43 2004|execd|dealc01|E|can't send asynchronous message to
commproc (qmaster:0) on host "dealc01": CONNECTION TIMED OUT

(ie., the qmaster cannot find itself!)


If I remember correctly, the 'sgecommdcntl -k' within the rcsge script fails
to kill the sge_commd.

Do I have a network problem or is something else?
If I use 'sgecommdcntl -t N' or 'sgecommdcntl -d', what should I be looking
for to aid with the diagnosis?

Thanks,



Dr. Mark Olesen
Thermofluid Dynamics Analyst
ArvinMeritor Light Vehicle Systems
Zeuna Staerker GmbH & Co. KG
Biberbachstr. 9
D-86154 Augsburg, GERMANY
tel: +49 (821) 4103 - 862
fax: +49 (821) 4103 - 7862
Mark.Olesen at ArvinMeritor.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list