[GE users] How to clear internal hostname cache?

Andy Schwierskott andy.schwierskott at sun.com
Tue Mar 21 11:22:47 GMT 2006


Hi,

the message

>> This host has the local hostname >compute-0-7.local<.

indicates the in /etc/hosts the actual hostname as an alias for

   127.0.0.1   localhost  compute-0-7.local

as this happens in some Linux distributions.

Delete "compute-0-7.local" from that line and any other names but
"localhost" and you'll be fine regarding this error.

Andy



On Tue, 21 Mar 2006, Chris Dagdigian wrote:

>
> I'm willing to bet that this hostname is defined somewhere on your system, 
> I've wrestled with SGE hostname resolution issues on many clusters and in 
> many complicated network, hostname and DNS resolving environments and the 
> root cause for name issues was *always* external and not within SGE.
>
> I've also not seen caching activity do anything significant when making 
> changes -- when I've fixed DNS or nameservice mistakes they are quickly 
> picked up by SGE.
>
> You did not mention testing with the "gethostname" and "gethostbbyaddr" and 
> the other utility binaries that should be in /opt/gridengine/utilbin/<arch> 
> on your system. Try running those directly to see what SGE sees. After that, 
> carefully make sure that what is in /etc/hosts matches what is being returned 
> by forward and reverse DNS. Depending on your operating system there can also 
> be other files and locations where hardcoded hostnames may be laying around.
>
>
> -Chris
>
>
>
>
> On Mar 21, 2006, at 4:46 AM, Kim Leng Goh wrote:
>
>> Hi Christian,
>>   Thanks for the speedy reply.
>> 
>> On 3/21/06, christian reissmann <Christian.Reissmann at sun.com> wrote:
>> [...]
>>> 
>>> The cl_commlib.c module was developed for 6.0! The 5.3p6 version uses
>>> sge_commd to resolve hostnames and has no cache at all.
>>> So I don't understand the question.
>> [...]
>> 
>> My problem is that SGE seems to think that my compute-0-7 node has the
>> hostname "network-0-0.local" when in fact it isn't (which prompted me
>> to think that this was in some cache somewhere or stored somewhere
>> else):
>> 
>> [root at compute-0-7 root]# qstat -f
>> denied: host "network-0-0.local" is neither submit nor admin host
>> 
>> 
>> Reinstalling sge on the compute node or reinstalling the compute node
>> doesn't seem to help:
>> 
>> 
>> [root at compute-0-7 gridengine]# ./install_execd -auto
>> 
>> Confirm Grid Engine default installation settings
>> -------------------------------------------------
>> 
>> The following default settings can be used for an accelerated
>> installation procedure:
>> 
>>       $SGE_ROOT          = /opt/gridengine
>>       service            = sge_commd
>>       admin user account = sge
>> 
>> Do you want to use these configuration parameters (y/n) [y] >>
>> denied: host "network-0-0.local" is neither submit nor admin host
>> 
>> 
>> 
>> Checking hostname resolving
>> ---------------------------
>> denied: host "network-0-0.local" is neither submit nor admin host
>> 
>> denied: host "network-0-0.local" is neither submit nor admin host
>> 
>> 
>> This host has the local hostname >compute-0-7.local<.
>> 
>> This host is unknown on the qmaster host.
>> 
>> Please make sure that you added this host as administrative host!
>> If you did not, please add this host now with the command
>> 
>>    # qconf -ah HOSTNAME
>> 
>> on your qmaster host.
>> 
>> Check again (y/n) [y] >>
>> 
>> Checking hostname resolving
>> ---------------------------
>> denied: host "network-0-0.local" is neither submit nor admin host
>> 
>> denied: host "network-0-0.local" is neither submit nor admin host
>> 
>> 
>> This host has the local hostname >compute-0-7.local<.
>> 
>> This host is unknown on the qmaster host.
>> 
>> Please make sure that you added this host as administrative host!
>> If you did not, please add this host now with the command
>> 
>>    # qconf -ah HOSTNAME
>> 
>> on your qmaster host.
>> 
>> If this host is already added as administrative host on your qmaster host
>> there may be a hostname resolving problem on this machine.
>> 
>> Please check your >/etc/hosts< file and >/etc/nsswitch.conf< file.
>> 
>> Hostname resolving problems will cause the problem that the
>> execution host will not be accepted by qmaster. Qmaster will
>> receive no load report values and show a load value
>> (>load_avg<) of 99.99 for this host.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> 
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list