[GE users] Problems with SGE 6.1 installation and multiple interfaces (and VLANs)

Reuti reuti at staff.uni-marburg.de
Sun Dec 9 16:30:01 GMT 2007


Hi,

Am 05.12.2007 um 10:59 schrieb Daniel Ruiz Molina:

> Some days ago, we bought a new IBM Cluster with the following  
> configuration:
>       ---> 1 x3650 server
>       ---> 16 x3550 compute nodes
> Now, we are trying to install SGE 6.1, but because of the CSM  
> installation (Cluster Systems Management), each double network  
> interface from each node is configured with:
>             --> eth0 --> private network 192.168.12.X --> for  
> computing (production network)
>             --> eth1 --> configured with 2 virtuals interfaces and  
> 3 IPs:
>                                           --> vlan10: double IP -->  
> 192.168.10.X (for monitoring (BMC)) and 192.168.11.X (for management)
>                                           --> vlan100: Public IP  
> (a.b.c.d)
>
> Route -n from the server:
> Destination     Gateway         Genmask         Flags Metric Ref     
> Use Iface
> 192.168.12.0    0.0.0.0         255.255.255.0   U     0       
> 0        0 eth0
> 192.168.11.0    0.0.0.0         255.255.255.0   U     0       
> 0        0 vlan10
> 192.168.10.0    0.0.0.0         255.255.255.0   U     0       
> 0        0 vlan10
> public_network   0.0.0.0         255.255.240.0   U     0       
> 0        0 vlan100
> 0.0.0.0         public_gateway    0.0.0.0         UG    0       
> 0        0 vlan100
>
> File /etc/hosts from the SERVER has the following configuration:
>                    127.0.0.1       localhost
>                     #VLAN100 Internet interface
>                    a.b.c.d    server- 
> int.hpc.local                        server-int              
> server.public_domain         server
>                    #VLAN10 BMC access to nodes (impnnNN.hpc.local)
>                    192.168.10.254  server-hwd.hpc.local          
> server-hwd             server.public_domain         server
>                    #VLAN10.1 installation and management to the nodes
>                    192.168.11.254  server-mgnt.hpc.local          
> server-mgnt         server.public_domain           server
>                    # Cluster PRODUCCION
>                    192.168.12.254  server-clus.hpc.local          
> server-clus             server.public_domain         server

having "server.public_domain         server" for all interfaces might  
be the problem. What is a plain `hostname `giving on the server and a  
node?

>                    #Hosts servicio. Node01
>                    192.168.10.01   imp01.hpc.local         imp01
>                    #Management Node01
>                    192.168.11.1    node01.hpc.local         node01
>                    #Compute Node01 (Production)
>                    192.168.12.1    clus01.hpc.local         clus01
>
> Nodes configuration is:
>          --> eth0 --> 192.168.11.X
>          --> eth1 --> 192.168.12.X
>
> Route -n from the nodes:
> Destination     Gateway         Genmask         Flags Metric Ref     
> Use Iface
> 192.168.12.0    0.0.0.0         255.255.255.0   U     0       
> 0        0 eth1
> 192.168.11.0    0.0.0.0         255.255.255.0   U     0       
> 0        0 eth0
> 0.0.0.0         192.168.11.254  0.0.0.0         UG    0       
> 0        0 eth0
>
> And /etc/hosts from the nodes:
> 192.168.11.1    node01.hpc.local node01
> 192.168.12.1    clus01.hpc.local clus01
> a.b.c.d               server-int.hpc.local         server- 
> int         server.public_domain         server
> 192.168.10.254  server-hwd.hpc.local     server-hwd      
> server.public_domain         server
> 192.168.11.254  server-mgnt.hpc.local     server-mgnt     
> server.public_domain         server
> 192.168.12.254  server-clus.hpc.local     server-clus        
> server.public_domain         server
>
> We have get some problems:
> 1) If we install SGE 6.1 with default configuration, we get errors  
> related with name resolution... Sgemaster can't start because of  
> name of the machine is different from the gethostbyname SGE has run.
> 2) Modifying manually init scripts, we have started sgemaster, but  
> when we try to install sgeexecd in a node, we receave a similar  
> error message referring name resolution with real name...
> 3) Is there any way for saying to "install_qmaster" for what  
> interface I want SGE will run when computer has 2 or 3 interfaces??

This should work, if you fill the host_aliases file with proper  
entries before the installation.

> 4) We have already read this document http:// 
> gridengine.sunsource.net/howto/multi_intrfcs.html , but it didn't  
> help us...
>
> First of all, thank you for those who have read this line... It  
> implies you have read lines above ;)

Depends - I like to read magazines from the last to the first page ;-)

-- Reuti


> Can someone help us? Anybody from this user-list has a similar  
> configuration on his cluster??
>
> Repeat.. Can someone help us?
>
> I have to say that we have installed SGE 6.0 and SGE 6.1 in a "home- 
> made" clusters (with PCs) and double network interfaces and we  
> didn't get any problem... All it run perfectly...
>
> Thanks.
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list