[GE users] act_qmaster contents changing

reuti reuti at staff.uni-marburg.de
Fri Nov 27 11:14:09 GMT 2009


Hi,

Am 27.11.2009 um 10:50 schrieb robhorton:

> Thanks for your comments.
>
> On Thu, 2009-11-26 at 17:27 +0100, petrik wrote:
>>> After installation, act_qmaster contains "taurus.local" (which is
>>> the /etc/hosts entry for the private interface).
>> You later say that /usr/local/sge/utilbin/lx24-amd64/gethostname does
>> not return this value. Correct? What does the gethostname -aname  
>> return?
>> Attach also output of gethostname -all.
>
> Yes, I get: (<fqdn> is the real fqdn)
>
> [root at taurus ~]# /usr/local/sge/utilbin/lx24-amd64/gethostname -aname
> taurus.<fqdn>
>
> [root at taurus ~]# /usr/local/sge/utilbin/lx24-amd64/gethostname -all
> Hostname: taurus.<fqdn>
> SGE name: taurus.<fqdn>
> Aliases:
> Host Address(es): <public ip>
>
>
>>> =============================================================
>>> [root at taurus ~]# /usr/local/sge/default/common/sgemaster start
>>>
>>> sge_qmaster didn't start!
>>> This is not a qmaster host!
>>> Please, check your act_qmaster file!
>>>
>> This happens when gethostname -aname and act_qmaster do not match.
>
> I can understand that but I'm not sure why the content of act_qmaster
> reverts each time it gets restarted.
>
>> Does the public and local name differ only in the domain? There've  
>> been
>> some bugfixes in this area for the upcoming 6.2u5.
>
> Yes.
>
> Reuti's suggestion of adding a $SGE_ROOT/default/common/host_aliases
> entry seems to make it behave as expected (thanks), although this
> doesn't seem to have been necessary on other clusters with a similar
> setup, so I'd still be interested to know what's happening.

SGE can only get the name of one interface under which it is known -  
usually the one from eth0. Some installations simply give the same  
name to the internal and the external interface. Then it would be  
possible to run SGE w/o a special host_aliases file.

As the name of act_qmaster is determined at installation time, I  
would assume that you configured the external interface after you  
installed SGE, and so the name of the machine (which is derived from  
eth0) changed. Otherwise you would have observed that the qmaster  
starts up (known under the public name), but the nodes can't contact  
him.

Some explanations you also find here: http://gridengine.sunsource.net/ 
howto/multi_intrfcs.html

-- Reuti


>
> Thanks,
> Rob
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=229737
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=229753

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list