[GE users] How to clear internal hostname cache?

Andy Schwierskott andy.schwierskott at sun.com
Tue Mar 21 11:48:55 GMT 2006


Kim Leng,

qmaster sees the execution host 10.255.255.247 as host "network-0-0.local".

The reason can be errors in the hostname resolving as Chris wrote or the
execution host has several network interfaces.

What's the oputput on qmaster host when you enter:

    <sge-root>/utilbin/<arch>/gethostbyname compute-0-7.local
    <sge-root>/utilbin/<arch>/gethostbyname compute-0-7
    <sge-root>/utilbin/<arch>/gethostbyaddr 10.255.255.247

Andy



On Tue, 21 Mar 2006, Kim Leng Goh wrote:

> Hi Andy,
>  I do not have "127.0.0.1   localhost  compute-0-7.local" but:
>
> [root at compute-0-7 root]# head -5 /etc/hosts
> # Do not remove the following line, or various programs
> # that require network functionality will fail.
> 127.0.0.1 localhost.localdomain localhost
> 172.18.36.248 frontend.foo.com
> 10.255.255.247  compute-0-7.local  compute-0-7
>
>
> I changed the last line above to "10.255.255.247  compute-0-7" and
> "qstat -f" still returns:
>
> [root at compute-0-7 root]# qstat -f
> denied: host "network-0-0.local" is neither submit nor admin host
>
>
> Thanks,
> KL
>
> On 3/21/06, Andy Schwierskott <andy.schwierskott at sun.com> wrote:
>> Hi,
>>
>> the message
>>
>>>> This host has the local hostname >compute-0-7.local<.
>>
>> indicates the in /etc/hosts the actual hostname as an alias for
>>
>>    127.0.0.1   localhost  compute-0-7.local
>>
>> as this happens in some Linux distributions.
>>
>> Delete "compute-0-7.local" from that line and any other names but
>> "localhost" and you'll be fine regarding this error.
>>
>> Andy
>>
>>
>>
>> On Tue, 21 Mar 2006, Chris Dagdigian wrote:
>>
>>>
>>> I'm willing to bet that this hostname is defined somewhere on your system,
>>> I've wrestled with SGE hostname resolution issues on many clusters and in
>>> many complicated network, hostname and DNS resolving environments and the
>>> root cause for name issues was *always* external and not within SGE.
>>>
>>> I've also not seen caching activity do anything significant when making
>>> changes -- when I've fixed DNS or nameservice mistakes they are quickly
>>> picked up by SGE.
>>>
>>> You did not mention testing with the "gethostname" and "gethostbbyaddr" and
>>> the other utility binaries that should be in /opt/gridengine/utilbin/<arch>
>>> on your system. Try running those directly to see what SGE sees. After that,
>>> carefully make sure that what is in /etc/hosts matches what is being returned
>>> by forward and reverse DNS. Depending on your operating system there can also
>>> be other files and locations where hardcoded hostnames may be laying around.
>>>
>>>
>>> -Chris
>>>
>>>
>>>
>>>
>>> On Mar 21, 2006, at 4:46 AM, Kim Leng Goh wrote:
>>>
>>>> Hi Christian,
>>>>   Thanks for the speedy reply.
>>>>
>>>> On 3/21/06, christian reissmann <Christian.Reissmann at sun.com> wrote:
>>>> [...]
>>>>>
>>>>> The cl_commlib.c module was developed for 6.0! The 5.3p6 version uses
>>>>> sge_commd to resolve hostnames and has no cache at all.
>>>>> So I don't understand the question.
>>>> [...]
>>>>
>>>> My problem is that SGE seems to think that my compute-0-7 node has the
>>>> hostname "network-0-0.local" when in fact it isn't (which prompted me
>>>> to think that this was in some cache somewhere or stored somewhere
>>>> else):
>>>>
>>>> [root at compute-0-7 root]# qstat -f
>>>> denied: host "network-0-0.local" is neither submit nor admin host
>>>>
>>>>
>>>> Reinstalling sge on the compute node or reinstalling the compute node
>>>> doesn't seem to help:
>>>>
>>>>
>>>> [root at compute-0-7 gridengine]# ./install_execd -auto
>>>>
>>>> Confirm Grid Engine default installation settings
>>>> -------------------------------------------------
>>>>
>>>> The following default settings can be used for an accelerated
>>>> installation procedure:
>>>>
>>>>       $SGE_ROOT          = /opt/gridengine
>>>>       service            = sge_commd
>>>>       admin user account = sge
>>>>
>>>> Do you want to use these configuration parameters (y/n) [y] >>
>>>> denied: host "network-0-0.local" is neither submit nor admin host
>>>>
>>>>
>>>>
>>>> Checking hostname resolving
>>>> ---------------------------
>>>> denied: host "network-0-0.local" is neither submit nor admin host
>>>>
>>>> denied: host "network-0-0.local" is neither submit nor admin host
>>>>
>>>>
>>>> This host has the local hostname >compute-0-7.local<.
>>>>
>>>> This host is unknown on the qmaster host.
>>>>
>>>> Please make sure that you added this host as administrative host!
>>>> If you did not, please add this host now with the command
>>>>
>>>>    # qconf -ah HOSTNAME
>>>>
>>>> on your qmaster host.
>>>>
>>>> Check again (y/n) [y] >>
>>>>
>>>> Checking hostname resolving
>>>> ---------------------------
>>>> denied: host "network-0-0.local" is neither submit nor admin host
>>>>
>>>> denied: host "network-0-0.local" is neither submit nor admin host
>>>>
>>>>
>>>> This host has the local hostname >compute-0-7.local<.
>>>>
>>>> This host is unknown on the qmaster host.
>>>>
>>>> Please make sure that you added this host as administrative host!
>>>> If you did not, please add this host now with the command
>>>>
>>>>    # qconf -ah HOSTNAME
>>>>
>>>> on your qmaster host.
>>>>
>>>> If this host is already added as administrative host on your qmaster host
>>>> there may be a hostname resolving problem on this machine.
>>>>
>>>> Please check your >/etc/hosts< file and >/etc/nsswitch.conf< file.
>>>>
>>>> Hostname resolving problems will cause the problem that the
>>>> execution host will not be accepted by qmaster. Qmaster will
>>>> receive no load report values and show a load value
>>>> (>load_avg<) of 99.99 for this host.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list