[GE users] error: commlib error: access denied (client IP resolved to host name "". (was Re: [GE users] Running MPICH jobs)
dag at sonsorol.org
Tue May 2 14:21:25 BST 2006
This reply is not on-topic for this "running MPICH" thread but I
wanted to add my $.02 in here regarding this particular error message.
I see this error occasionally on Apple OS X based clusters, usually
the main symptom is a SGE admin approaching us to say that "qstat"
will fail at random intervals and then suddenly start working again
within a minute or two. The specific error usually looks like this:
>> error: commlib error: access denied (client IP resolved to host name
>> "". This is not identical to clients host name "")
>> unable to contact qmaster using port 701 on host
>> "xxx.xxx(hostname deleted).xxx"
Whenever I've been able to login to the system in question I've been
able to confirm the behavior -- sometimes qstat will work, sometimes
it will not and will fail with the error noted above. I have
collectively spent many days trying to fix the error shown below, it
appears randomly on about 5% of the Apple OS X base clusters that I
work on. I've never been able to correlate it to a particular system
configuration and I've never been able to reproduce the error after
"fixing" it. The operating system version does not matter and the
CPU arch (G4 vs G5) does not matter.
In all cases, forward and reverse DNS is functioning perfectly, both
at the /etc/hosts and the DNS resolver levels.
in all cases all of the SGE utilbin/ binaries are also functioning
perfectly and able to resolve names and IPs correctly and without error.
Over the past year or so, I've been able to fix this issue on about
50% of the SGE systems showing the behavior simply by dropping new or
updated courtesy binaries into place. The remaining 50% of the
clusters are not fixed by this and continue to show the odd behavior
even when the latest binaries are dropped into place.
For those systems not fixed by new binaries, the only way (after
*much* trial and error and experimentation) I've been able to
conclusively make the problem go away is to build Grid Engine from
source on the affected system. Hand-built binaries installed into
$SGE_ROOT have always cleared the issue. This is the only "fix" that
works for us right now for this particular issue.
This is a real issue that I've seen on multiple different (Apple)
systems but since I can't figure out the root cause or "fix" it by
any other means than rebuilding from sourcecode I've never filed an
Issue report. If I ever learn more I'll open up something in the
Anyway, like I said this is not on topic for the thread but the error
message quoted below brought back bad memories (heh!) and I thought
I'd send a note so it would get listed in the archives. Maybe this
will help someone doing a google or archive search on "access denied
(client IP resolved to host name """ in the future.
On May 2, 2006, at 5:33 AM, Reuti wrote:
>> error: commlib error: access denied (client IP resolved to host
>> name "". This is not identical to clients host name "")
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users