[GE users] sge 6.2U3 scheduler problem

templedf dan.templeton at sun.com
Fri Dec 18 05:03:16 GMT 2009


What spooling method are you using?  Is there any particular pattern 
about which hosts have the jobs that are declared redundant?  How 
reproducible is the problem?

Daniel

esneeh wrote:
> Daniel, we restarted the daemon on the server.  This used to be our formula for fixing everything in sge5.3.
>
> Thanks,
> Eddie
> ---
>
>
> -----Original Message-----
> From: Dan.Templeton at Sun.COM [mailto:Dan.Templeton at Sun.COM] 
> Sent: Thursday, December 17, 2009 5:24 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] sge 6.2U3 scheduler problem
>
> You edit the global host config with "qconf -mconf".  There's an 
> attribute called qmaster_params.  See sge_conf(5).
>
> |E| messages aren't normal.  Not surprisingly, the E is for error.
>
> What is the trigger for this condition?  Does it follow a qmaster 
> restart?  Execd restart?
>
> Daniel
>
> esneeh wrote:
>   
>> Chris, Daniel, Fred, thanks for responding.
>>
>> 1. Fred, qconf -secl returns the name of the "ADMIN_HOST_LIST"  (1 server) that's in the config file.
>>
>> 2. Chris, yes, there is a relatively large number of pending jobs, but that's always been the case.
>>    The messages file has entries like:
>>
>> 12/17/2009 14:50:51|worker|tspirit|E|execd at lcd-770 reports running job (8626295.1/master) in queue "msi_1cpu.q at lcd-770" that was not supposed to be there - killing
>> .12/17/2009 14:50:51|worker|tspirit|I|removing trigger to terminate job 89396.1
>> 12/17/2009 14:50:51|worker|tspirit|I|job 89396.1 finished on host lcd-563
>> Is the |E| message something that's "normal"?
>>
>> 3. Daniel, I see the qrsh_control_port error usually with the message:
>>    "Can not get job info messages, scheduler is not available"
>>    Where is the global host conf file?  I'm not able to see the qmaster_params in the gui as well.
>>    The qmaster/schedd/messages is size 0, doesn't have anything.
>>
>> Thanks again,
>> Eddie
>>
>> ---
>>
>>
>> -----Original Message-----
>> From: fy [mailto:fly at anydata.co.uk] 
>> Sent: Thursday, December 17, 2009 3:41 PM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] sge 6.2U3 scheduler problem
>>
>> Is the scheduler running?
>>
>> If you type "qconf -secl", do you get output like this?
>>
>>        ID NAME            HOST
>> --------------------------------------------------
>>         1 scheduler      ...
>>
>>
>> Fred Youhanaie
>>
>>
>> On 17/12/09 22:04, esneeh wrote:
>>   
>>     
>>> Hi everyone, I'm using SGE 6.2U3.  Jobs have stopped getting scheduled all of a sudden, and qstat is giving me the following message:
>>> "Can not get job info messages, scheduler is not available"
>>>
>>> Does anyone know what might be causing this message and what can be done to get jobs running again?
>>>
>>>
>>> Thanks for any advice,
>>> Eddie
>>> ---
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=233998
>>>
>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>     
>>>       
>> ------------------------------------------------------
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=234022
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>
>>     
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=234023
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=234031
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=234039

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list