[GE users] sge_execd problems

Mag Gam magawake at gmail.com
Fri Oct 17 18:56:16 BST 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Wow. I can't.

Seems my master is stopped running. So I restarted it and I get these
messages in the log


10/17/2008 13:53:27|qmaster|master01.engrMec.unc.edu|I|read job
database with 163 entries in 1 seconds
10/17/2008 13:53:27|qmaster|master01.engrMec.unc.edu|I|qmaster hard
descriptor limit is set to 8192
10/17/2008 13:53:27|qmaster|master01.engrMec.unc.edu|I|qmaster soft
descriptor limit is set to 8192
10/17/2008 13:53:27|qmaster|master01.engrMec.unc.edu|I|qmaster will
use max. 8172 file descriptors for communication
10/17/2008 13:53:27|qmaster|master01.engrMec.unc.edu|I|qmaster will
accept max. 99 dynamic event clients
10/17/2008 13:53:27|qmaster|master01.engrMec.unc.edu|I|starting up GE
6.1u5 (lx24-amd64)
10/17/2008 13:53:28|qmaster|master01.engrMec.unc.edu|E|commlib error:
endpoint is not unique error (endpoint
"master01.engrMec.unc.edu/schedd/1" is already connected)
10/17/2008 13:53:53|qmaster|master01.engrMec.unc.edu|E|cqueue_list_locate_qinstance("(null)@(null)"):
cqueue == NULL("(null)", "(null)", 1, 0
10/17/2008 13:53:53|qmaster|master01.engrMec.unc.edu|E|writing job
finish information: can't locate queue "(null)@(null)"
10/17/2008 13:53:53|qmaster|master01.engrMec.unc.edu|W|job 5014.1
failed on host <unknown host> before writing exit_status because:
shepherd exited with exit status 19
10/17/2008 13:53:53|qmaster|master01.engrMec.unc.edu|C|!!!!!!!!!! got
NULL element for QU_rerun !!!!!!!!!!



On Fri, Oct 17, 2008 at 1:32 PM, Rayson Ho <rayrayson at gmail.com> wrote:
> Looks like a network resolution/connection problem... Are you abkle to
> connect to the master from the command line, like:
>
> % telnet master01.engrMec.unc.edu 536
>
> Rayson
>
>
> On 10/17/08, Mag Gam <magawake at gmail.com> wrote:
>> I have the sgemaster running on our head node and the on the clients I
>> am able to start up sge_execd
>>
>> I see sge_execd process running on the client.
>>
>> But when I do
>>
>> $ qhost
>> error: commlib error: can't connect to service (Connection refused)
>> error: unable to contact qmaster using port 536 on host
>> "master01.engrMec.unc.edu"
>>
>>
>> When I start up the client I see no changed in the messages file
>> either. Has anyone seen this before? Using, GE 6.1u5
>>
>> TIA
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list