[GE users] reresolve hostname failed: can't resolve hostname with SGE 6.1

Sam Skipsey sskipsey at nesc.ac.uk
Mon Jun 2 14:06:46 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti wrote:
> Hi,
> 
> Am 02.06.2008 um 14:50 schrieb Sam Skipsey:
> 
>> I have the following problem:
>>
>> We have a cluster, most of which is running on RHEL4 64bit, including 
>> the qmaster, and have lx26-amd64 SGE installed.
>> We also have a couple of nodes which have to run RHEL3 32bit for 
>> compatibility reasons. These, of course, run the relevant release of 
>> SGE, but for lx24_x86. They are configured as submit hosts.
>>
>> This setup worked perfectly with SGE 6.0.
>>
>> Recently, we upgraded to SGE 6.1, and also introduced a shadow host 
>> for more stability - the new qmaster and shadow are on different 
>> machines to the old 6.0 qmaster (which is now removed).
>>
>> On the lx24_x86 nodes, we now get:
>>
>> # qstat -f
>> reresolve hostname failed: can't resolve host name
>>
>> and the same for other SGE commands.
>>
>> Interestingly,
>>
>> $SGE_ROOT/default/common/act_qmaster contains the hostname of the 
>> qmaster (we checked)
>> and
>> $SGE_ROOT/utilbin/lx24_x86/gethostbyname <hostname of qmaster>
>> returns the correct IP
>> similarly,
>> $SGE_ROOT/utilbin/lx24_x86/gethostbyaddr <IP of qmaster>
>> returns the correct hostname.
>>
>> Does anyone have any ideas, since we appear to have ruled out the 
>> obvious network issues?
> 
> just for curiosity: what is the entry in /etc/nsswitch.conf? Ping and 
> all othet stuff is working?

Ping and all that stuff was working, yes.

> Any orphaned entry in 
> $SGE_ROOT/default/common/host_aliases?
> 

No - actually, someone's random fiddling fixed matters just after I sent 
this mail.

Our RHEL3 boxes have domain names of the form name1.name2.domainname 
(that is, their "hostname" is actually in a subdomain of the cluster 
domain). Originally, our /etc/hosts had a line

<routable IP>  name1.name2.domainname

which worked for SGE 6.0

For 6.1, it seems we need

<routable IP>  name1.name2.domainname name1

We have no idea why this works, but it does. (Thanks for the 
suggestions, though.)

Sam

> -- Reuti
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list