[GE users] Installation issues
craffi
dag at sonsorol.org
Mon Jul 13 11:53:11 BST 2009
Hello,
A few suggestions ...
If you want to really see how SGE sees your hostname and DNS
environment, spend some time running the programs in utilbin/lx24-
amd64/ -- the 'gethostname' etc. tools are what the SGE scripts
actually use.
For some more practical advice ...
You posted your qmaster /etc/hosts file which looks OK but you did not
mention the actual system hostname. My guess based on the error you
mention below is that the hostname your server thinks it is operating
as is not actually listed out in /etc/hosts.
What happens when you type "hostname" ? Is that exact name listed in
the hosts file or available via DNS?
The host_aliases file syntax is pretty simple but it only really
becomes useful after you have a functional qmaster host. Once you have
that just examine the contents of the "act_qmaster" file that is
created in $SGE_ROOT/$SGE_CELL/common/act_qmaster
Take the value from act_qmaster (exactly as it appears) and then
create a host_aliases file in the same directory that has the format:
<hostname> <IP>
... with "IP" being the address you want your compute nodes to use
when trying to communicate with the qmaster.
host_aliases is fantastic for dealing with multiple NICs and making
sure your nodes use the private network to speak with the qmaster but
you have to have a functional master first!
-Chris
On Jul 13, 2009, at 4:30 AM, brandstaetter wrote:
> Hello List!
>
> I'm trying to get ge6.2u3 installed on our cluster.
> Currently, I'm having problems with the host name resolution.
> ./inst_sge -m
> error resolving local host: can't resolve host name (h_errno =
> HOST_NOT_FOUND)
> can't get hostname of this machine. Installation failed.
>
> ./utilbin/lx24-amd64/gethostname
> error resolving local host: can't resolve host name (h_errno =
> HOST_NOT_FOUND)
>
> The master/head node has multiple (3) network interfaces. I already
> searched for infos, and read mentions of host_alias, but I could not
> find a description of what to put where to get it to work.
> Can anyone please help me?
>
> Attached is the hosts file of the master node:
>
> 127.0.0.1 localhost.localdomain localhost
> ::1 localhost6.localdomain6 localhost6
> 10.42.0.20 BIRFH-CLUHD.FH-HAGENBERG.ac.at BIRFH-CLUHD
> ### CLUSTER NAMES
> 10.130.0.1 BIRFH-CLU00
> 10.130.0.2 BIRFH-TESLAR
> 10.130.0.3 BIRFH-TESLAL
> 10.130.0.10 BIRFH-CLU01
> 10.130.0.11 BIRFH-CLU02
> 10.130.0.12 BIRFH-CLU03
> ### Sun Grid Engine
> 10.130.1.1 sun-master-grid
> 10.130.1.2 sun-teslar-grid
> 10.130.1.3 sun-telsal-grid
> 10.130.1.10 sun-01-grid
> 10.130.1.11 sun-02-grid
> 10.130.1.12 sun-03-grid
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206773
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
> ].
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206791
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users
mailing list