[GE users] sge_execd died without any trace
fansn at hotmail.com
Fri Apr 16 16:07:35 BST 2010
[ The following text is in the "gb2312" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some characters may be displayed incorrectly. ]
Thanks for your information. I tried other levels, same. But I did find something in the qmaster/messages. When the sge_execd dies, the qmaster will log these information:
03/30/2010 09:54:33|worker|edagrid|E|cqueue_list_locate_qinstance("(null)@(null)"): cqueue == NULL("(null)", "(null)", 1, 0
03/30/2010 09:54:33|worker|edagrid|E|writing job finish information: can't locate queue "(null)@(null)"
04/06/2010 10:08:46|worker|edagrid|E|cqueue_list_locate_qinstance("(null)@(null)"): cqueue == NULL("(null)", "(null)", 1, 0
04/06/2010 10:08:46|worker|edagrid|E|writing job finish information: can't locate queue "(null)@(null)"
04/16/2010 14:54:47|worker|edagrid|E|cqueue_list_locate_qinstance("<unknown queue>"): cqueue == NULL("<unknown queue>", "<null>", 0, 0
It died on 03/30, 04/06, 04/16 on different exectution hosts.
Is there something wrong with my queue configuration I guess?
Date: Fri, 16 Apr 2010 16:58:20 +0200
From: marco.donauer at sun.com
To: users at gridengine.sunsource.net
CC: fansn at hotmail.com
Subject: Re: [GE users] sge_execd died without any trace
Did you also try an other debug level and starting the execd without startup script, starting
executing the binary directly.
The messages file doesn't show anything? Does the qmaster get any information or error in his messages file?
Am 16.04.2010 16:51, schrieb fansn:
I'm using sge 6.2u5 on Redhat Enterprise 5 (2.6.18-164.6.1.el5), upgraded from 6.2u3 1 month ago. The master is very stable running more than 1 month. Everything works very well except on some execd nodes, the segexecd daemon will disappear with unknown reason, after a uncertian period, and nothing is left in the log file. However the shepherd daemons will continue running when the seg_execd dies.
I'm trying debuging the process. I set debug level to 5 (dl 5) but when I restart the daemon, it just display "starting sge_execd", although the process sge_execd is running. (The startup script is not finished either).
Does anyone have similar problem? Any comments will be great. Many thanks.
Hotmail: ???????????????? ?????<https://signup.live.com/signup.aspx?id=60969>
Hotmail?Microsoft ?????????????????????????? ?????<https://signup.live.com/signup.aspx?id=60969>
More information about the gridengine-users