[GE users] sge 6.2U3 scheduler problem

esneeh esneeh at marvell.com
Fri Dec 18 03:54:58 GMT 2009


Daniel, we restarted the daemon on the server.  This used to be our formula for fixing everything in sge5.3.

Thanks,
Eddie
---


-----Original Message-----
From: Dan.Templeton at Sun.COM [mailto:Dan.Templeton at Sun.COM] 
Sent: Thursday, December 17, 2009 5:24 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] sge 6.2U3 scheduler problem

You edit the global host config with "qconf -mconf".  There's an 
attribute called qmaster_params.  See sge_conf(5).

|E| messages aren't normal.  Not surprisingly, the E is for error.

What is the trigger for this condition?  Does it follow a qmaster 
restart?  Execd restart?

Daniel

esneeh wrote:
> Chris, Daniel, Fred, thanks for responding.
>
> 1. Fred, qconf -secl returns the name of the "ADMIN_HOST_LIST"  (1 server) that's in the config file.
>
> 2. Chris, yes, there is a relatively large number of pending jobs, but that's always been the case.
>    The messages file has entries like:
>
> 12/17/2009 14:50:51|worker|tspirit|E|execd at lcd-770 reports running job (8626295.1/master) in queue "msi_1cpu.q at lcd-770" that was not supposed to be there - killing
> .12/17/2009 14:50:51|worker|tspirit|I|removing trigger to terminate job 89396.1
> 12/17/2009 14:50:51|worker|tspirit|I|job 89396.1 finished on host lcd-563
> Is the |E| message something that's "normal"?
>
> 3. Daniel, I see the qrsh_control_port error usually with the message:
>    "Can not get job info messages, scheduler is not available"
>    Where is the global host conf file?  I'm not able to see the qmaster_params in the gui as well.
>    The qmaster/schedd/messages is size 0, doesn't have anything.
>
> Thanks again,
> Eddie
>
> ---
>
>
> -----Original Message-----
> From: fy [mailto:fly at anydata.co.uk] 
> Sent: Thursday, December 17, 2009 3:41 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] sge 6.2U3 scheduler problem
>
> Is the scheduler running?
>
> If you type "qconf -secl", do you get output like this?
>
>        ID NAME            HOST
> --------------------------------------------------
>         1 scheduler      ...
>
>
> Fred Youhanaie
>
>
> On 17/12/09 22:04, esneeh wrote:
>   
>> Hi everyone, I'm using SGE 6.2U3.  Jobs have stopped getting scheduled all of a sudden, and qstat is giving me the following message:
>> "Can not get job info messages, scheduler is not available"
>>
>> Does anyone know what might be causing this message and what can be done to get jobs running again?
>>
>>
>> Thanks for any advice,
>> Eddie
>> ---
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=233998
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>     
>
> ------------------------------------------------------
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=234022
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=234023

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=234031

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list