[GE users] sge_execd died without any trace

fansn fansn at hotmail.com
Fri Apr 16 16:07:35 BST 2010


    [ The following text is in the "gb2312" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi Marco,

Thanks for your information. I tried other levels, same. But I did find something in the qmaster/messages. When the sge_execd dies, the qmaster will log these information:

03/30/2010 09:54:33|worker|edagrid|E|cqueue_list_locate_qinstance("(null)@(null)"): cqueue == NULL("(null)", "(null)", 1, 0
03/30/2010 09:54:33|worker|edagrid|E|writing job finish information: can't locate queue "(null)@(null)"
04/06/2010 10:08:46|worker|edagrid|E|cqueue_list_locate_qinstance("(null)@(null)"): cqueue == NULL("(null)", "(null)", 1, 0
04/06/2010 10:08:46|worker|edagrid|E|writing job finish information: can't locate queue "(null)@(null)"
04/16/2010 14:54:47|worker|edagrid|E|cqueue_list_locate_qinstance("<unknown queue>"): cqueue == NULL("<unknown queue>", "<null>", 0, 0

It died on 03/30, 04/06, 04/16 on different exectution hosts.

Is there something wrong with my queue configuration I guess?

Yours sincerely,

Sinong Fan


________________________________
Date: Fri, 16 Apr 2010 16:58:20 +0200
From: marco.donauer at sun.com
To: users at gridengine.sunsource.net
CC: fansn at hotmail.com
Subject: Re: [GE users] sge_execd died without any trace

Did you also try an other debug level and starting the execd without startup script, starting
executing the binary directly.
The messages file doesn't show anything? Does the qmaster get any information or error in his messages file?

Marco


Am 16.04.2010 16:51, schrieb fansn:


Hi Everyone,


I'm using sge 6.2u5 on Redhat Enterprise 5 (2.6.18-164.6.1.el5), upgraded from 6.2u3 1 month ago. The master is very stable running more than 1 month. Everything works very well except on some execd nodes, the segexecd daemon will disappear with unknown reason, after a uncertian period, and nothing is left in the log file. However the shepherd daemons will continue running when the seg_execd dies.



I'm trying debuging the process. I set debug level to 5 (dl 5) but when I restart the daemon, it just display "starting sge_execd", although the process sge_execd is running. (The startup script is not finished either).



Does anyone have similar problem? Any comments will be great. Many thanks.


Yours sincerely,




Sinong Fan







________________________________
Hotmail: ???????????????? ?????<https://signup.live.com/signup.aspx?id=60969>

________________________________
Hotmail?Microsoft ?????????????????????????? ?????<https://signup.live.com/signup.aspx?id=60969>



More information about the gridengine-users mailing list