[GE users] HOST_NOT_RESOLVABLE

neoideo axischire at gmail.com
Tue Apr 13 16:37:41 BST 2010


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

also would be useful if you show us your /etc/hosts files for both qmaster and exec hosts
Cristobal




On Tue, Apr 13, 2010 at 5:52 AM, reuti <reuti at staff.uni-marburg.de<mailto:reuti at staff.uni-marburg.de>> wrote:
Am 12.04.2010 um 18:12 schrieb henk:

> I have installed 6.2u5 and ran into a problem with hostname resolution.
>
> qmaster is running on the head node and execd has started on one of the
> compute nodes. The error message in the qmaster messages file is this:
>
> 04/12/2010 17:01:38|listen|ham4|E|commlib error: can't resolve host name
> (can't resolve rdata hostname "cn002")
> 04/12/2010 17:01:38|listen|ham4|E|commlib error: local host name error
> (remote rdata host name "cn002" is not equal to local resolved host name
> "(HOST_NOT_RESOLVABLE)")
>
> On stopping the execd on the compute node this message is produced
>
> error: commlib error: access denied (server host resolves rdata host
> "cn002" as "(HOST_NOT_RESOLVABLE)")
> ERROR: unable to contact qmaster using port 6444 on host "ham4 "
>   Shutting down Grid Engine execution daemon
> ls: cannot access /cn002/active_jobs: No such file or directory

$ nslookup cn002

is working? The tools in $SGE_ROOT/utilbin/lx24-amd64/gethostbyname and gethostbyaddr gives correct results for the headnode and the exechost?

Do you have two network cards in the headnode?

-- Reuti


> Does the commlib error come from the master host as it listens on port
> 6444 and therefore the compute node cannot contact the master?
>
> I have tried modifications to the configuration file and replaced the
> value "builtin" with this but without effect:
>
> qlogin_command               telnet
> qlogin_daemon                /usr/sbin/in.telnetd
> rlogin_command               /usr/bin/ssh -Y
> rlogin_daemon                /usr/sbin/sshd -i
> rsh_command                  /usr/bin/ssh
> rsh_daemon                   /usr/sbin/sshd -i
>
> I have only found a few references to the above error and they referred
> to use of qrsh.
>
> Does anyone have an idea what is wrong here?
>
> Thanks in advance
>
> Henk
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253146
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253227

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].




More information about the gridengine-users mailing list