[GE users] Problems with SGE 6.1 installation and multiple interfaces (and VLANs)

Daniel Ruiz Molina daniel.ruiz at aomail.uab.es
Wed Dec 5 09:59:08 GMT 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi!!

Some days ago, we bought a new IBM Cluster with the following configuration:
       ---> 1 x3650 server
       ---> 16 x3550 compute nodes
Now, we are trying to install SGE 6.1, but because of the CSM 
installation (Cluster Systems Management), each double network interface 
from each node is configured with:
             --> eth0 --> private network 192.168.12.X --> for computing 
(production network)
             --> eth1 --> configured with 2 virtuals interfaces and 3 IPs:
                                           --> vlan10: double IP --> 
192.168.10.X (for monitoring (BMC)) and 192.168.11.X (for management)
                                           --> vlan100: Public IP (a.b.c.d)

Route -n from the server:
Destination     Gateway         Genmask         Flags Metric Ref    Use 
Iface
192.168.12.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
192.168.11.0    0.0.0.0         255.255.255.0   U     0      0        0 
vlan10
192.168.10.0    0.0.0.0         255.255.255.0   U     0      0        0 
vlan10
public_network   0.0.0.0         255.255.240.0   U     0      0        0 
vlan100
0.0.0.0         public_gateway    0.0.0.0         UG    0      0        
0 vlan100

File /etc/hosts from the SERVER has the following configuration:
                    127.0.0.1       localhost
                     #VLAN100 Internet interface
                    a.b.c.d    server-int.hpc.local                     
    server-int             server.public_domain         server
                    #VLAN10 BMC access to nodes (impnnNN.hpc.local)
                    192.168.10.254  server-hwd.hpc.local         
server-hwd             server.public_domain         server
                    #VLAN10.1 installation and management to the nodes
                    192.168.11.254  server-mgnt.hpc.local         
server-mgnt         server.public_domain           server
                    # Cluster PRODUCCION
                    192.168.12.254  server-clus.hpc.local         
server-clus             server.public_domain         server
                    #Hosts servicio. Node01
                    192.168.10.01   imp01.hpc.local         imp01
                    #Management Node01
                    192.168.11.1    node01.hpc.local         node01
                    #Compute Node01 (Production)
                    192.168.12.1    clus01.hpc.local         clus01

Nodes configuration is:
          --> eth0 --> 192.168.11.X
          --> eth1 --> 192.168.12.X

Route -n from the nodes:
Destination     Gateway         Genmask         Flags Metric Ref    Use 
Iface
192.168.12.0    0.0.0.0         255.255.255.0   U     0      0        0 eth1
192.168.11.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
0.0.0.0         192.168.11.254  0.0.0.0         UG    0      0        0 eth0

And /etc/hosts from the nodes:
192.168.11.1    node01.hpc.local node01
192.168.12.1    clus01.hpc.local clus01
a.b.c.d               server-int.hpc.local         server-int         
server.public_domain         server
192.168.10.254  server-hwd.hpc.local     server-hwd     
server.public_domain         server
192.168.11.254  server-mgnt.hpc.local     server-mgnt    
server.public_domain         server
192.168.12.254  server-clus.hpc.local     server-clus       
server.public_domain         server

We have get some problems:
1) If we install SGE 6.1 with default configuration, we get errors 
related with name resolution... Sgemaster can't start because of name of 
the machine is different from the gethostbyname SGE has run.
2) Modifying manually init scripts, we have started sgemaster, but when 
we try to install sgeexecd in a node, we receave a similar error message 
referring name resolution with real name...
3) Is there any way for saying to "install_qmaster" for what interface I 
want SGE will run when computer has 2 or 3 interfaces??
4) We have already read this document 
http://gridengine.sunsource.net/howto/multi_intrfcs.html , but it didn't 
help us...

First of all, thank you for those who have read this line... It implies 
you have read lines above ;)

Can someone help us? Anybody from this user-list has a similar 
configuration on his cluster??

Repeat.. Can someone help us?

I have to say that we have installed SGE 6.0 and SGE 6.1 in a 
"home-made" clusters (with PCs) and double network interfaces and we 
didn't get any problem... All it run perfectly...

Thanks.




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list