[GE users] qstat communication problems

russray rray at semtech.com
Fri Jan 8 16:34:24 GMT 2010


Not sure why this fixed it since I could already ssh and ping all the hosts.  I added the submission nodes to the /etc/hosts file on the qmaster and now qstat works on the submission nodes.



reuti <reuti at staff.uni-marburg.de> wrote on 01/08/2010 09:56:54 AM:

> Am 08.01.2010 um 15:20 schrieb russray:
>
> > I think I spoke too soon.  The reboot did work right after the
> > machine booted, but this morning I'm on the machine again and
> > getting the same errors.
> >
> > error: commlib error: access denied (client IP resolved to host
> > name "". This is not identical to clients host name "")
> > error: unable to contact qmaster using port 6444 on host
> > "us03grid1.semnet.dom"
> >
> > I tried telnet us03grid1 6444 and it get the following which makes
> > me think it really is listening and netstat on the qmaster reports
> > it is listening:
> >
> > telnet us03grid1 6444
> > Trying 192.168.201.11...
>
> The Qmaster has also an external interface which is the primary one?
>
> http://gridengine.sunsource.net/ds/viewMessage.do?
> dsForumId=38&dsMessageId=237342
>
> -- Reuti
>
>
> > Connected to us03grid1.semnet.dom (192.168.201.11).
> > Escape character is '^]'.
> > Connection closed by foreign host.
> >
> > On another cluster that isn't having these problems the qmaster is
> > also listed as an execution host, but not included in any of the
> > queues, does the qmaster need to also be an execution host?
> >
> > I'm really baffled by this behavior.
> >
> >
> >
> >
> > russray <rray at semtech.com> wrote on 01/07/2010 03:48:42 PM:
> >
> > >
> > > Well, a reboot of my qmaster server seems to have fixed the problem.
> > > Still not sure what happened, but life is good again.
> > >
> > >
> > >
> > > russray <rray at semtech.com>
> > > 01/07/2010 03:06 PM
> > >
> > >
> > > users at gridengine.sunsource.net
> > >
> > > Subject
> > >
> > > Re: [GE users] qstat communication problems
> > >
> > >
> > >
> > >
> > >
> > > Finally getting back to this after the holidays.  No iptables on
> > > either the qmaster, executable, or submission machines.
> > >
> > > reuti <reuti at staff.uni-marburg.de> wrote on 12/18/2009 06:48:47 PM:
> > >
> > > > Am 18.12.2009 um 23:04 schrieb russray:
> > > >
> > > > >
> > > > > I've had a small farm running for several months now, but after
> > > > > what I think was  a series of yum updates, my submission
> > nodes can
> > > > > no longer talk to the qmaster.  When I type qstat, I get the
> > > > > following:
> > > > >
> > > > > error: commlib error: access denied (client IP resolved to host
> > > > > name "". This is not identical to clients host name "")
> > > > > error: unable to contact qmaster using port 6444 on host
> > > > > "us03grid1.semnet.dom"
> > > >
> > > > Any firewall suddenly in place and/or changing the allowed ports?
> > > >
> > > > -- Reuti
> > > >
> > > >
> > > > > I can ping and ssh to us03grid1, so that communcation seems
> > ok.  If
> > > > > I use gethostbyname, gethostbyaddr, gethostname, I get the
> > > > > following from the machines in question (using one of them as an
> > > > > example):
> > > > >
> > > > > $SGE_ROOT/utilbin/lx24-x86/gethostbyname us03linux1
> > > > > Hostname: us03linux1.semnet.dom
> > > > > Aliases:  us03linux1
> > > > > Host Address(es): 192.168.201.61
> > > > >
> > > > > $SGE_ROOT/utilbin/lx24-x86/gethostbyaddr 192.168.201.61
> > > > > Hostname: us03linux1.semnet.dom
> > > > > Aliases:  us03linux1
> > > > > Host Address(es): 192.168.201.61
> > > > >
> > > > > $SGE_ROOT/utilbin/lx24-x86/gethostname
> > > > > Hostname: us03linux1.semnet.dom
> > > > > Aliases:  us03linux1
> > > > > Host Address(es): 192.168.201.61
> > > > >
> > > > > $SGE_ROOT/utilbin/lx24-x86/gethostbyname us03grid1
> > > > > Hostname: us03grid1.semnet.dom
> > > > > Aliases:
> > > > > Host Address(es): 192.168.201.11
> > > > >
> > > > > Any clues on what changed to cause this and how to fix?
> > > > >
> > > > >
> > > > > Russell Ray
> > > > > rray at semtech.com
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?
> dsForumId=38&dsMessageId=237384
>
> To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list