[GE users] Exec daemon can't resolve master hostname

reuti reuti at staff.uni-marburg.de
Fri Jan 8 11:01:45 GMT 2010


Am 07.01.2010 um 22:59 schrieb reidac:

> Hi all --
>
> I am attempting to install the Debian-packaged version of the Sun  
> GridEngine on a group of Debian "lenny" machines, and have run into  
> a problem.
>
> The configuration is, the master host has two network interfaces, a  
> "front", routable one, and a "back" one, 192.168.0.206, connected  
> to the cluster subnet.  The "front" interface IP address has a DNS  
> host name, and $SGE_ROOT/utilbin/lx26-amd64/gethostname reports  
> this name.  The "back" interface has a name assigned via the /etc/ 
> hosts file.
>
> There is one submit host (so far), and it's similarly configured.
>
> The exec hosts are all on the private subnet, and can only see the  
> "back" of the master host.  All of the hosts, master, submit, and  
> exec, are configured to use the /etc/hosts name of the "back"  
> master interface as the master.
>
> But, after running for two minutes, the exec daemons report:
>> E can't send asynchronous message to commproc (qmaster:1) on host  
>> "<configured-master-name>": can't resolve host name
>
> Following this, the host disappears from the queue, and jobs can no  
> longer be run.
>
> The cluster sub-net network configuration appears to be fine. I can  
> ping the master host by name, and I can ssh to it. /etc/ 
> nsswitch.conf is set up for "files" name resolution on the exec  
> hosts.  The sge-provided gethostbyname and gethostbyaddr give  
> answers that are consistent and correct on the exec hosts.
>
> The only possible sources of trouble I can see are, firstly, that  
> the master host's gethostname gives an answer which is not  
> consistent with the configured master host name, and secondly, in  
> the exec host's /etc/hosts files, some of the aliases for the  
> master host are the same as the master host's DNS name, i.e. that  
> of the "front" interface.

Sounds like you need an $SGE_ROOT/default/common/host_aliases file to  
tell the qmaster to run on the internal interface (i.e.just one line  
in this file):

http://gridengine.sunsource.net/howto/multi_intrfcs.html

-- Reuti


>
> I am perplexed, and would be grateful for any extra clues...
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=237189
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=237342

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list