[GE users] Problems with SGE 6.1 installation and multiple interfaces (and VLANs)

Daniel Templeton Dan.Templeton at Sun.COM
Wed Dec 5 16:44:13 GMT 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

So, you set up the host_aliases file, and it didn't resolve the issue?  
Did you restart the qmaster after editing the file?

Daniel

Daniel Ruiz Molina wrote:
> Hi!!
>
> Some days ago, we bought a new IBM Cluster with the following 
> configuration:
>       ---> 1 x3650 server
>       ---> 16 x3550 compute nodes
> Now, we are trying to install SGE 6.1, but because of the CSM 
> installation (Cluster Systems Management), each double network 
> interface from each node is configured with:
>             --> eth0 --> private network 192.168.12.X --> for 
> computing (production network)
>             --> eth1 --> configured with 2 virtuals interfaces and 3 IPs:
>                                           --> vlan10: double IP --> 
> 192.168.10.X (for monitoring (BMC)) and 192.168.11.X (for management)
>                                           --> vlan100: Public IP 
> (a.b.c.d)
>
> Route -n from the server:
> Destination     Gateway         Genmask         Flags Metric Ref    
> Use Iface
> 192.168.12.0    0.0.0.0         255.255.255.0   U     0      0        
> 0 eth0
> 192.168.11.0    0.0.0.0         255.255.255.0   U     0      0        
> 0 vlan10
> 192.168.10.0    0.0.0.0         255.255.255.0   U     0      0        
> 0 vlan10
> public_network   0.0.0.0         255.255.240.0   U     0      0        
> 0 vlan100
> 0.0.0.0         public_gateway    0.0.0.0         UG    0      
> 0        0 vlan100
>
> File /etc/hosts from the SERVER has the following configuration:
>                    127.0.0.1       localhost
>                     #VLAN100 Internet interface
>                    a.b.c.d    server-int.hpc.local                     
>    server-int             server.public_domain         server
>                    #VLAN10 BMC access to nodes (impnnNN.hpc.local)
>                    192.168.10.254  server-hwd.hpc.local         
> server-hwd             server.public_domain         server
>                    #VLAN10.1 installation and management to the nodes
>                    192.168.11.254  server-mgnt.hpc.local         
> server-mgnt         server.public_domain           server
>                    # Cluster PRODUCCION
>                    192.168.12.254  server-clus.hpc.local         
> server-clus             server.public_domain         server
>                    #Hosts servicio. Node01
>                    192.168.10.01   imp01.hpc.local         imp01
>                    #Management Node01
>                    192.168.11.1    node01.hpc.local         node01
>                    #Compute Node01 (Production)
>                    192.168.12.1    clus01.hpc.local         clus01
>
> Nodes configuration is:
>          --> eth0 --> 192.168.11.X
>          --> eth1 --> 192.168.12.X
>
> Route -n from the nodes:
> Destination     Gateway         Genmask         Flags Metric Ref    
> Use Iface
> 192.168.12.0    0.0.0.0         255.255.255.0   U     0      0        
> 0 eth1
> 192.168.11.0    0.0.0.0         255.255.255.0   U     0      0        
> 0 eth0
> 0.0.0.0         192.168.11.254  0.0.0.0         UG    0      0        
> 0 eth0
>
> And /etc/hosts from the nodes:
> 192.168.11.1    node01.hpc.local node01
> 192.168.12.1    clus01.hpc.local clus01
> a.b.c.d               server-int.hpc.local         server-int         
> server.public_domain         server
> 192.168.10.254  server-hwd.hpc.local     server-hwd     
> server.public_domain         server
> 192.168.11.254  server-mgnt.hpc.local     server-mgnt    
> server.public_domain         server
> 192.168.12.254  server-clus.hpc.local     server-clus       
> server.public_domain         server
>
> We have get some problems:
> 1) If we install SGE 6.1 with default configuration, we get errors 
> related with name resolution... Sgemaster can't start because of name 
> of the machine is different from the gethostbyname SGE has run.
> 2) Modifying manually init scripts, we have started sgemaster, but 
> when we try to install sgeexecd in a node, we receave a similar error 
> message referring name resolution with real name...
> 3) Is there any way for saying to "install_qmaster" for what interface 
> I want SGE will run when computer has 2 or 3 interfaces??
> 4) We have already read this document 
> http://gridengine.sunsource.net/howto/multi_intrfcs.html , but it 
> didn't help us...
>
> First of all, thank you for those who have read this line... It 
> implies you have read lines above ;)
>
> Can someone help us? Anybody from this user-list has a similar 
> configuration on his cluster??
>
> Repeat.. Can someone help us?
>
> I have to say that we have installed SGE 6.0 and SGE 6.1 in a 
> "home-made" clusters (with PCs) and double network interfaces and we 
> didn't get any problem... All it run perfectly...
>
> Thanks.
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list