[GE users] problems with reaching hosts

K.Radacki K.Radacki at mail.uni-wuerzburg.de
Tue Oct 21 19:15:41 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti wrote:
      Hi,

      Am 17.10.2008 um 11:14 schrieb k.radacki:

            Dear All,
            after some 8 faithful years of work it came time, that I
            have to exchange
            our queue-server (old dual P3-550 board get into pension
            scheme :-)
            The computation nodes stayed more or less the same. I've
            installed SGE 6.2.
            Unfortunately the executions hosts were unavailable for
            computations.

            qstat -f
            all.q at hall13.khazad.dum        BIP   0/0/4         
            -NA-     -NA-          a
            all.q at hall14.khazad.dum        BIP   0/0/4         
            -NA-     -NA-          a

            In .../spool/../messages I've found
            local configuration localhost.localdomain not defined -
            using global
            configuration
            main|hall13|I|starting up SGE 6.2 (lx24-amd64)
            main|hall13|E|can't connect to service
            main|hall13|E|can't get configuration from qmaster --
            backgrounding

            The hostname on all nodes gives "proper" name
            [root at hall13 ~]# hostname
            hall13.khazad.dum

            now I commented in /etc/hosts
            # 127.0.0.1     localhost.localdomain   localhost
            and queue works
            all.q at hall13.khazad.dum        BIP   0/0/4         
            0.10     lx24-amd64
            all.q at hall14.khazad.dum        BIP   0/0/4         
            0.00     lx24-amd64

            Can somebody explain me why SGE uses "wrong" host name 
            and what should
            I do
            to correct this behaviour?
            I'm not that happy with commenting localhost in
            /etc/hosts file.
            Who knows what under problems I will get with network
            services.


      the loopback device is often used and removing it might lead to
      weird behavior I fear. Did you check before with the utilities
      programs in $SGE_ROOT/utilbin/$ARC like gethostbyaddr et al.?

      The question is more: why is SGE thinking, that the name of the
      machine is localhost.localdomain at all. Were the nodes newly
      installed? Maybe SGE is started before the network, and as no NIS
      answer is availble, so it uses localhost. I put the SGE startup
      always at the end of the startup. What is the order in
      /etc/nsswitch.conf to check local files?

      -- Reuti

Hi Reuti,
I'll try to answer your suggestions/questions in the same order.

Running  /Services/SGE/utilbin/lx24-x86/gethostbyaddr 192.168.1.1 on 192.168.1.5
gives expected answers
Hostname: hall01.khazad.dum
Aliases:  hall01
Host Address(es): 192.168.1.1

As I have written most of the hosts were there before installing new server (and
actually they were running under supervision of sge 5.3)

I started first the sge manualy as all other services were started from
"/etc/rc.d" scripts during the boot. It actually couldn't happened that
sge was initialized before network.

nsswitch.conf:
passwd:     files
shadow:     files
group:      files
hosts:      files dns
bootparams: files
ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files
netgroup:   files
publickey:  files
automount:  files
aliases:    files


MfG
Kris



--------------------------------------------------------------------- To
unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net For additional
commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list