[GE users] How to clear internal hostname cache?

Joe Landman landman at scalableinformatics.com
Wed Mar 22 02:16:43 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Comments embedded

Kim Leng Goh wrote:
> Hi Joe,
>   I think we are going somewhere(notice that I tried your commands on
> both the frontend and compute node. Take at look at "[root at compute-0-7
> root]# dig network-0-0" below:
> 
> On 3/22/06, Joe Landman <landman at scalableinformatics.com> wrote:
> 
>>Try
>>
>>        dig network-0-0
>>        dig network-0-0.local
>>
>>Lets see what it says.
> 
> 
> [root at frontend root]# dig network-0-0
> 
> ; <<>> DiG 9.2.4 <<>> network-0-0
> ;; global options:  printcmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 13576
> ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
> 
> ;; QUESTION SECTION:
> ;network-0-0.                   IN      A
> 
> ;; AUTHORITY SECTION:
> .                       86379   IN      SOA     A.ROOT-SERVERS.NET.
> NSTLD.VERISIGN-GRS.COM. 2006032100 1800 900 604800 86400

Ok, this means that it couldn't resolve this all the way up to the DNS 
roots.  Thats good (for the head node) and it shouldn't be presented 
back to the compute nodes this way then.  The NXDOMAIN answer means "no 
such domain" which means it didn't find it.

> 
> ;; Query time: 0 msec
> ;; SERVER: 137.132.0.254#53(137.132.0.254)
> ;; WHEN: Wed Mar 22 09:52:42 2006
> ;; MSG SIZE  rcvd: 104
> 
> [root at frontend root]# dig network-0-0.local
> 
> ; <<>> DiG 9.2.4 <<>> network-0-0.local
> ;; global options:  printcmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 59093
> ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
> 
> ;; QUESTION SECTION:
> ;network-0-0.local.             IN      A
> 
> ;; AUTHORITY SECTION:
> local.                  86400   IN      SOA     ns.local.
> root.ns.local. 1142604062 28800 7200 2419200 86400

This is saying that the .local domain root server (the head node) doesnt 
have an answer.  The NXDOMAIN is read as "no such domain"

> 
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.1#53(127.0.0.1)
> ;; WHEN: Wed Mar 22 09:53:03 2006
> ;; MSG SIZE  rcvd: 79
> 
> 
> 

So far so good.

> 
> [root at compute-0-7 root]# qstat -f
> denied: host "network-0-0.local" is neither submit nor admin host
> [root at compute-0-7 root]# dig network-0-0
> 
> ; <<>> DiG 9.2.4 <<>> network-0-0
> ;; global options:  printcmd
> ;; connection timed out; no servers could be reached
> [root at compute-0-7 root]# dig network-0-0.local
> 
> ; <<>> DiG 9.2.4 <<>> network-0-0.local
> ;; global options:  printcmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 53973
> ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
> 
> ;; QUESTION SECTION:
> ;network-0-0.local.             IN      A
> 
> ;; AUTHORITY SECTION:
> local.                  86400   IN      SOA     ns.local.
> root.ns.local. 1142604062 28800 7200 2419200 86400

Yup.  Not found, and it looks like it tried to query ...

> 
> ;; Query time: 20 msec
> ;; SERVER: 10.1.1.1#53(10.1.1.1)

... the head node.  Great!

> ;; WHEN: Wed Mar 22 09:57:21 2006
> ;; MSG SIZE  rcvd: 79
> 
> 
> 
>>Also, did you turn off the nscd?
>>
>>        /etc/init.d/nscd stop
>>
>>Sometimes this gets in the way of doing the right thing.
> 
> 
> 
> [root at frontend root]# service nscd status
> nscd is stopped
> [root at frontend root]#
> 
> 
> 
>>Also, what does your /etc/nsswitch.conf have for the "hosts" line?
> 
> 
> 
> 
> [root at frontend root]# grep ^hosts /etc/nsswitch.conf
> hosts:      files
> 
> 
> [root at compute-0-7 root]# grep ^hosts /etc/nsswitch.conf
> hosts:      files dns

Hmmm.....  dns showed that it doesn't have a clue who network-0-0 is. 
This is good.  Compute-0-7 reads local files and dns to figure out who 
network-0-0 is.

Ok, next check.  On compute-0-7 (and the head node)

  	cd /opt/gridengine/utilbin/lx26-amd64/
	./gethostbyname network-0-0.local

This is what my cluster says

[root at compute-0-0 ~]# cd /opt/gridengine/utilbin/lx26-amd64/
[root at compute-0-0 lx26-amd64]# ./gethostbyname network-0-0.local
error resolving host "network-0-0.local": can't resolve host name 
(h_errno = HOST_NOT_FOUND)


Yours should say about the same thing.

If it doesn't, please do this

	qconf -sel

This is what mine shows:

[root at compute-0-0 lx26-amd64]# qconf -sel
compute-0-0.local
compute-0-1.local
compute-0-2.local

I bet we see a network-0-0 in there somewhere.

	




> 
> 
> Thanks in advance,
> KL
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list