[GE users] access denied (client IP resolved to host name "". This is not identical to clients host name "")

John Saalwaechter johnsaalwaechter at yahoo.com
Thu Jun 15 14:58:12 BST 2006


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I know this is a bit of an old thread, but we
finally found the culprit that was causing this
problem in our environment, so I wanted to share.

Basically we were being impacted by the bug noted in
http://gridengine.sunsource.net/issues/show_bug.cgi?id=1661
(we're running 6.0u4).  But we couldn't figure out where
SGE commands were being run using IP addresses instead of
hostnames.  The answer turned out to be Nagios.  We were
using qping in a Nagios check to verify that the qmaster
was up.  Our Nagios command originally looked like this:

   /usr/lib/nagios/plugins/check_qmaster -H $HOSTADDRESS$

It was just arbitrary that we were checking using the
IP address instead of the hostname.  Once we connected
the SGE problems with this Nagios check, we changed the
Nagios command to:

   /usr/lib/nagios/plugins/check_qmaster -H $HOSTNAME$

Now the problem is gone for us.

John

--- John Saalwaechter <johnsaalwaechter at yahoo.com> wrote:

> For what it's worth, my qmaster system has this same problem
> all the time.  I'd say that more than 50% of the time any
> SGE command run from the qmaster results in the error message.
> This problem only happens on our qmaster, so I've worked around
> it by always using another host to do SGE admin work.
> 
> Of note is the fact that this is a SPARC V880 running Solaris 9
> and N1GE 6.0u4.  Like Chris, I've checked and rechecked all
> DNS and /etc/hosts entries, but I cannot find any problems there.
> 
> Chris -- can you explain in more detail your comments below
> about /etc/hosts?  My system is not behind any private network,
> but we do have a private link on this host for NFS connectivity
> to $SGE_ROOT.
> 
> Also, when I get the error, it's also accompanied by this:
> ERROR: unable to contact qmaster using port 537 on host "xxx"
> 
> John
> 
> --- Chris Dagdigian <dag at sonsorol.org> wrote:
> 
> > 
> > I got lucky today.
> > 
> > For the first time ever on a non-Apple OS X system I was able to  
> > recreate the mysterious
> > 
> >   access denied (client IP resolved to host name "". This is not  
> > identical to clients host name "")
> > 
> > ... error
> > 
> > To further make things more fun, the error condition also produces 
> 
> > another bug-worthy case of non-compliant XML output, the empty "<>"
>  
> > tags break automated XML parsers.
> > 
> > Check this out:
> > 
> > > [dag at test xmlqstat]$ qstat -f -xml -j 1
> > > error: commlib error: access denied (client IP resolved to host  
> > > name "". This is not identical to clients host name "")
> > > <?xml version='1.0'?>
> > > <comunication_error 
> xmlns:xsd="http://www.w3.org/2001/XMLSchema">
> > >   <>
> > >     <AN_status>11</AN_status>
> > >     <AN_text>unable to contact qmaster using port 701 on host  
> > > "test.gridengine.info"</AN_text>
> > >     <AN_quality>0</AN_quality>
> > >   </>
> > > </comunication_error>
> > > *** glibc detected *** double free or corruption (fasttop):  
> > > 0x0000000040254440 ***
> > > Aborted
> > > [dag at test xmlqstat]$
> > 
> > This was in the qmaster messages spool file:
> > > 05/04/2006 17:39:43|qmaster|test|E|commlib error: local host name
>  
> > > error (can't resolve client IP address)
> > 
> > This is on a single CPU Opteron system running Centos 4 and SGE  
> > courtesy binaries downloaded about 30 minutes ago (SGE 6.0u7)
> > 
> > This system has good DNS and working utilbin/ binaries but it did
> not
> >  
> > have an entry in /etc/hosts with the public IP and fully qualified 
> 
> > hostname.
> > 
> > Shortly after making the /etc/hosts entry the problem went away.
> > 
> > In my experience with this error in the past, its always been a  
> > transient "comes and goes" issue. I'm hoping the /etc/hosts
> addition 
> > 
> > resolved the problem but it would also be nice if it does not since
>  
> > this is a testing box that I can use for further tracing and  
> > debugging if needed. I'm also going to see if I can find the bits
> of 
> > 
> > source code that may be producing the bad XML output for this error
>  
> > condition.
> > 
> > -Chris
> > 
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> > 
> > 
> 
> 
> --
> johnsaalwaechter at yahoo.com
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list