[GE users] new exec host grief
james.gibbon at nottingham.ac.uk
Fri Feb 13 13:38:17 GMT 2009
Thanks for the reply.
On Fri, 13 Feb 2009 07:23:13 -0500
craffi <dag at sonsorol.org> wrote:
> Hi James,
> The root cause seems to be that the compute node can't get to port 701
> on host "linux6" - you should look into the standard firewall,
> routing, DNS lookup and other issues that typically can cause "can't
> get to host X, port Y" type problems.
Yes that was my thought yesterday and I did look into that - I should
have made that clear, sorry.
Anyway: there is no firewall between the exec host and the master,
and I can telnet to port 701 from exe host -> master.
DNS and /etc/hosts are correct on both machines, and the hostname as
represented in act_qmaster resolved to the correct IP address. There
are a few execd_messages log files in /tmp/, but they only restate
the same information, ie
02/12/2009 20:54:51| main|minitel|E|can't connect to service
02/12/2009 20:54:51| main|minitel|E|can't get configuration from qmaster -- backgrounding
I've ensured all the gridengine processes are zapped on minitel (the
exec host) before attempting to restart gridengine-exec.
Nothing relevant-looking in the qmaster's logs.
One possibly significant thing is that the entry in the 'exec hosts'
directory (/spool/qmaster/exec_hosts) on the qmaster has the FQDN of
minitel as its filename, and in its 'hostname' field .. ?
James Gibbon BSc MBCS
Brain and Body Centre
University of Nottingham
Nottingham NG7 2RD
+44 115 846 8255
This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users