[GE users] Sun GridEngine execution hosts not finding config

Reuti reuti at staff.uni-marburg.de
Fri Mar 31 09:32:58 BST 2006


Hi,

Am 30.03.2006 um 13:31 schrieb Richard Hobbs:

> Hello,
>
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: 29 March 2006 22:19
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Sun GridEngine execution hosts not
>> finding config
>>
>> Hi,
>>
>> Am 29.03.2006 um 13:52 schrieb Richard Hobbs:
>>
>>> Hello,
>>>
>>> We have a number of Sun GridEngine execution hosts with qmaster on a
>>> separate machine. Most of the 44 hosts work perfectly, but
>> a few of
>>> them are
>>> reporting "local configuration host.domain not defined -
>> using global
>>
>> do you need any local configuration at all? I remember this only for
>> 5.3, and as all nodes used the same configuration I simply ignored
>> this message, as it was intended to work this way.
>
> No, I do not need any local configuration, and I have never set any  
> up, but
> the reason I am concerned is that after printing this error message  
> to the
> screen, "qmon" shows all queues on that host as red. Not disabled,  
> or alarm,
> just red with no disabled/suspended/alarm status bar at the bottom  
> of each
> queue at all.
>

if they are red, they are not available. In the "qhost" command they  
should also get only dashes for their entries, and a "au" in "qstat - 
f". Did you change any setting of the network configuration?

Is there anything in:

$SGE_ROOT/default/common/local_conf

Cheers - Reuti


>> Which SGE version are you using?
>
> 5.3p6
>
>> What is qconf -sconfl saying?
>
> 'qconf -sconfl' is reporting a list of hosts. However, the hosts  
> that are
> reporting "no local config" are not in the list. The hosts that are  
> "broken"
> are on the network though, and they are able to ping the qmaster,  
> and also
> to mount the nfs export on the qmaster containing the SGE binaries,  
> so the
> network doesn't seem to be an issue.
>
> Any ideas?
>
> Thanks again,
> Richard.
>
>
>> -- Reuti
>>
>>
>>> configuration" when starting up. Obviously, i've replaced the
>>> hostname and
>>> domain with "host.domain" here, to protect our hostnames.
>>>
>>> Earlier today, nearly 20 of the hosts were reporting this, and the
>>> only way
>>> to solve it was to reboot the qmaster machine.
>>>
>>> Restarting the qmaster daemon, or rebooting the execution hosts or
>>> daemons
>>> did nothing.
>>>
>>> Now i have rebooted the qmaster, all hosts are fixed apart
>> from one
>>> or two.
>>>
>>> There is absolutely nothing in
>>> [b]$SGE_ROOT/default/spool/qmaster/messages[b] at all, apart from
>>> the usual
>>> starting up messages.
>>>
>>> Does anyone know what could be causing this?
>>>
>>> Thanks in advance :-)
>>>
>>> Richard.
>>>
>>> -- 
>>> Richard Hobbs (Systems Administrator)
>>> Toshiba Research Europe Ltd. - Speech Technology Group
>>> Web: http://www.toshiba-europe.com/research/
>>> Normal Email: richard.hobbs at crl.toshiba.co.uk
>>> Mobile Email: mobile at mongeese.co.uk
>>> Tel: +44 1223 376964        Mobile: +44 7811 803377
>>>
>>>
>>>
>>>
>> _____________________________________________________________________
>>> This e-mail has been scanned for viruses by Verizon Business
>>> Internet Managed Scanning Services - powered by MessageLabs. For
>>> further information visit http://www.mci.com
>>>
>>>
>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> _____________________________________________________________________
>> This e-mail has been scanned for viruses by Verizon Business
>> Internet Managed Scanning Services - powered by MessageLabs.
>> For further information visit http://www.mci.com
>>
>>
>
> -- 
> Richard Hobbs (Systems Administrator)
> Toshiba Research Europe Ltd. - Speech Technology Group
> Web: http://www.toshiba-europe.com/research/
> Normal Email: richard.hobbs at crl.toshiba.co.uk
> Mobile Email: mobile at mongeese.co.uk
> Tel: +44 1223 376964        Mobile: +44 7811 803377
>
>
>
> _____________________________________________________________________
> This e-mail has been scanned for viruses by Verizon Business  
> Internet Managed Scanning Services - powered by MessageLabs. For  
> further information visit http://www.mci.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list