[GE users] Installation issues

craffi dag at sonsorol.org
Mon Jul 13 11:53:11 BST 2009


Hello,

A few suggestions ...

If you want to really see how SGE sees your hostname and DNS  
environment, spend some time running the programs in utilbin/lx24- 
amd64/ -- the 'gethostname' etc. tools are what the SGE scripts  
actually use.

For some more practical advice ...

You posted your qmaster /etc/hosts file which looks OK but you did not  
mention the actual system hostname. My guess based on the error you  
mention below is that the hostname your server thinks it is operating  
as is not actually listed out in /etc/hosts.

What happens when you type "hostname" ? Is that exact name listed in  
the hosts file or available via DNS?

The host_aliases file syntax is pretty simple but it only really  
becomes useful after you have a functional qmaster host. Once you have  
that just examine the contents of the "act_qmaster" file that is  
created in $SGE_ROOT/$SGE_CELL/common/act_qmaster

Take the value from act_qmaster (exactly as it appears) and then  
create a host_aliases file in the same directory that has the format:

<hostname>  <IP>

... with "IP" being the address you want your compute nodes to use  
when trying to communicate with the qmaster.

host_aliases is fantastic for dealing with multiple NICs and making  
sure your nodes use the private network to speak with the qmaster but  
you have to have a functional master first!


-Chris



On Jul 13, 2009, at 4:30 AM, brandstaetter wrote:

> Hello List!
>
> I'm trying to get ge6.2u3 installed on our cluster.
> Currently, I'm having problems with the host name resolution.
> ./inst_sge -m
> error resolving local host: can't resolve host name (h_errno =  
> HOST_NOT_FOUND)
> can't get hostname of this machine. Installation failed.
>
> ./utilbin/lx24-amd64/gethostname
> error resolving local host: can't resolve host name (h_errno =  
> HOST_NOT_FOUND)
>
> The master/head node has multiple (3) network interfaces. I already  
> searched for infos, and read mentions of host_alias, but I could not  
> find a description of what to put where to get it to work.
> Can anyone please help me?
>
> Attached is the hosts file of the master node:
>
> 127.0.0.1               localhost.localdomain localhost
> ::1                     localhost6.localdomain6 localhost6
> 10.42.0.20              BIRFH-CLUHD.FH-HAGENBERG.ac.at BIRFH-CLUHD
> ### CLUSTER NAMES
> 10.130.0.1              BIRFH-CLU00
> 10.130.0.2              BIRFH-TESLAR
> 10.130.0.3              BIRFH-TESLAL
> 10.130.0.10             BIRFH-CLU01
> 10.130.0.11             BIRFH-CLU02
> 10.130.0.12             BIRFH-CLU03
> ### Sun Grid Engine
> 10.130.1.1              sun-master-grid
> 10.130.1.2              sun-teslar-grid
> 10.130.1.3              sun-telsal-grid
> 10.130.1.10             sun-01-grid
> 10.130.1.11             sun-02-grid
> 10.130.1.12             sun-03-grid
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206773
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206791

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list