henk h.a.slim at durham.ac.uk
Mon Apr 12 17:12:56 BST 2010

I have installed 6.2u5 and ran into a problem with hostname resolution.

qmaster is running on the head node and execd has started on one of the
compute nodes. The error message in the qmaster messages file is this:

04/12/2010 17:01:38|listen|ham4|E|commlib error: can't resolve host name
(can't resolve rdata hostname "cn002")
04/12/2010 17:01:38|listen|ham4|E|commlib error: local host name error
(remote rdata host name "cn002" is not equal to local resolved host name

On stopping the execd on the compute node this message is produced

error: commlib error: access denied (server host resolves rdata host
"cn002" as "(HOST_NOT_RESOLVABLE)")
ERROR: unable to contact qmaster using port 6444 on host "ham4 "
   Shutting down Grid Engine execution daemon
ls: cannot access /cn002/active_jobs: No such file or directory

Does the commlib error come from the master host as it listens on port
6444 and therefore the compute node cannot contact the master?

I have tried modifications to the configuration file and replaced the
value "builtin" with this but without effect:

qlogin_command               telnet
qlogin_daemon                /usr/sbin/in.telnetd
rlogin_command               /usr/bin/ssh -Y
rlogin_daemon                /usr/sbin/sshd -i
rsh_command                  /usr/bin/ssh
rsh_daemon                   /usr/sbin/sshd -i

I have only found a few references to the above error and they referred
to use of qrsh.

Does anyone have an idea what is wrong here?

Thanks in advance



To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list