[GE users] qstat communication problems

russray rray at semtech.com
Fri Jan 8 14:20:28 GMT 2010


I think I spoke too soon.  The reboot did work right after the machine booted, but this morning I'm on the machine again and getting the same errors.

error: commlib error: access denied (client IP resolved to host name "". This is not identical to clients host name "")
error: unable to contact qmaster using port 6444 on host "us03grid1.semnet.dom"

I tried telnet us03grid1 6444 and it get the following which makes me think it really is listening and netstat on the qmaster reports it is listening:

telnet us03grid1 6444
Trying 192.168.201.11...
Connected to us03grid1.semnet.dom (192.168.201.11).
Escape character is '^]'.
Connection closed by foreign host.

On another cluster that isn't having these problems the qmaster is also listed as an execution host, but not included in any of the queues, does the qmaster need to also be an execution host?

I'm really baffled by this behavior.




russray <rray at semtech.com> wrote on 01/07/2010 03:48:42 PM:

>
> Well, a reboot of my qmaster server seems to have fixed the problem.
> Still not sure what happened, but life is good again.
>
>
>
> russray <rray at semtech.com>
> 01/07/2010 03:06 PM
>
>
> users at gridengine.sunsource.net
>
> Subject
>
> Re: [GE users] qstat communication problems
>
>
>
>
>
> Finally getting back to this after the holidays.  No iptables on
> either the qmaster, executable, or submission machines.
>
> reuti <reuti at staff.uni-marburg.de> wrote on 12/18/2009 06:48:47 PM:
>
> > Am 18.12.2009 um 23:04 schrieb russray:
> >
> > >
> > > I've had a small farm running for several months now, but after
> > > what I think was  a series of yum updates, my submission nodes can
> > > no longer talk to the qmaster.  When I type qstat, I get the
> > > following:
> > >
> > > error: commlib error: access denied (client IP resolved to host
> > > name "". This is not identical to clients host name "")
> > > error: unable to contact qmaster using port 6444 on host
> > > "us03grid1.semnet.dom"
> >
> > Any firewall suddenly in place and/or changing the allowed ports?
> >
> > -- Reuti
> >
> >
> > > I can ping and ssh to us03grid1, so that communcation seems ok.  If
> > > I use gethostbyname, gethostbyaddr, gethostname, I get the
> > > following from the machines in question (using one of them as an
> > > example):
> > >
> > > $SGE_ROOT/utilbin/lx24-x86/gethostbyname us03linux1
> > > Hostname: us03linux1.semnet.dom
> > > Aliases:  us03linux1
> > > Host Address(es): 192.168.201.61
> > >
> > > $SGE_ROOT/utilbin/lx24-x86/gethostbyaddr 192.168.201.61
> > > Hostname: us03linux1.semnet.dom
> > > Aliases:  us03linux1
> > > Host Address(es): 192.168.201.61
> > >
> > > $SGE_ROOT/utilbin/lx24-x86/gethostname
> > > Hostname: us03linux1.semnet.dom
> > > Aliases:  us03linux1
> > > Host Address(es): 192.168.201.61
> > >
> > > $SGE_ROOT/utilbin/lx24-x86/gethostbyname us03grid1
> > > Hostname: us03grid1.semnet.dom
> > > Aliases:
> > > Host Address(es): 192.168.201.11
> > >
> > > Any clues on what changed to cause this and how to fix?
> > >
> > >
> > > Russell Ray
> > > rray at semtech.com



More information about the gridengine-users mailing list