[GE users] problems with reaching hosts

Reuti reuti at staff.uni-marburg.de
Wed Oct 22 00:25:38 BST 2008


Hi Kris,

Am 21.10.2008 um 20:15 schrieb K.Radacki:

> Reuti wrote:
>>
>> Hi,
>>
>> Am 17.10.2008 um 11:14 schrieb k.radacki:
>>
>>> Dear All,
>>> after some 8 faithful years of work it came time, that I have to  
>>> exchange
>>> our queue-server (old dual P3-550 board get into pension scheme :-)
>>> The computation nodes stayed more or less the same. I've  
>>> installed SGE 6.2.
>>> Unfortunately the executions hosts were unavailable for  
>>> computations.
>>>
>>> qstat -f
>>> all.q at hall13.khazad.dum        BIP   0/0/4          -NA-     - 
>>> NA-          a
>>> all.q at hall14.khazad.dum        BIP   0/0/4          -NA-     - 
>>> NA-          a
>>>
>>> In .../spool/../messages I've found
>>> local configuration localhost.localdomain not defined - using global
>>> configuration
>>> main|hall13|I|starting up SGE 6.2 (lx24-amd64)
>>> main|hall13|E|can't connect to service
>>> main|hall13|E|can't get configuration from qmaster -- backgrounding
>>>
>>> The hostname on all nodes gives "proper" name
>>> [root at hall13 ~]# hostname
>>> hall13.khazad.dum
>>>
>>> now I commented in /etc/hosts
>>> # 127.0.0.1     localhost.localdomain   localhost
>>> and queue works
>>> all.q at hall13.khazad.dum        BIP   0/0/4          0.10     lx24- 
>>> amd64
>>> all.q at hall14.khazad.dum        BIP   0/0/4          0.00     lx24- 
>>> amd64
>>>
>>> Can somebody explain me why SGE uses "wrong" host name  and what  
>>> should
>>> I do
>>> to correct this behaviour?
>>> I'm not that happy with commenting localhost in /etc/hosts file.
>>> Who knows what under problems I will get with network services.
>>
>> the loopback device is often used and removing it might lead to  
>> weird behavior I fear. Did you check before with the utilities  
>> programs in $SGE_ROOT/utilbin/$ARC like gethostbyaddr et al.?
>>
>> The question is more: why is SGE thinking, that the name of the  
>> machine is localhost.localdomain at all. Were the nodes newly  
>> installed? Maybe SGE is started before the network, and as no NIS  
>> answer is availble, so it uses localhost. I put the SGE startup  
>> always at the end of the startup. What is the order in /etc/ 
>> nsswitch.conf to check local files?
>>
>> -- Reuti 
> Hi Reuti,
> I'll try to answer your suggestions/questions in the same order.
>
> Running  /Services/SGE/utilbin/lx24-x86/gethostbyaddr 192.168.1.1  
> on 192.168.1.5 gives expected answers
> Hostname: hall01.khazad.dum
> Aliases:  hall01
> Host Address(es): 192.168.1.1
>
> As I have written most of the hosts were there before installing  
> new server (and actually they were running under supervision of sge  
> 5.3)
>
> I started first the sge manualy as all other services were started  
> from "/etc/rc.d" scripts during the boot. It actually couldn't  
> happened that
> sge was initialized before network.
>
> nsswitch.conf:
> passwd:     files
> shadow:     files
> group:      files
> hosts:      files dns

- What is the content of /etc/hosts?
- The DNS entry seems to be right, but not the local one. Can you a)  
check whether there is a file /etc/HOSTNAME and it's correct; b) what  
is `$ hostname` returning? Can you set a hostname just with the  
command `$ hostname hall01`.

-- Reuti


> bootparams: files
> ethers:     files
> netmasks:   files
> networks:   files
> protocols:  files
> rpc:        files
> services:   files
> netgroup:   files
> publickey:  files
> automount:  files
> aliases:    files
>
>
> MfG
> Kris
>
>
>
> ---------------------------------------------------------------------  
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net  
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list