[GE users] Using one SGE qmaster for two clusters

Reuti reuti at staff.uni-marburg.de
Wed May 14 09:35:14 BST 2008


    [ The following text is in the "WINDOWS-1252" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Roberta,

Am 13.05.2008 um 22:11 schrieb Roberta Gigon:

> I have the SGE qmaster running on the head node of one of my  
> clusters with the nodes of that cluster set up as execution hosts.   
> I?m now wondering what I  need to do to set up the nodes on a  
> second cluster to use the same qmaster.  The nodes of the second  
> cluster are behind a head node and on a private network, but are  
> using NAT and can ?see? the head node of the first cluster (they  
> can telnet to port 6444 on that system just fine),

this is a problem: the qmaster can't contact any machine in the NAT  
subcluster. Often you can setup in a NAT (or router) to redirect some  
incoming connections on a certain port to a specific machine, i.e. a  
webserver in the subcluster and on the NAT-like router you can bound  
port 80 to be directed always to exactly this machine in the subcluster.

Can you login from the qmaster to any of the machine in the subcluster?

You are now seeing, that the qmaster tries to resolve the address,  
but it only knows the address of the router I assume, i.e. NAT gateway.

-- Reuti


> but when try to install the sgeexecd on the nodes I get this error:
>
> error: commlib error: access denied (server host resolves source  
> host "r1i0n0-ib0.cambridge-us1089.slb.com" as "(HOST_NOT_RESOLVABLE)")
> ERROR: unable to contact qmaster using port 6444 on host  
> "bear.cl.slb.com"
>
> I can traceroute to the SGE qmaster:
> r1i0n0 /opt/sge# traceroute bear.cl.slb.com
> traceroute to bear.cl.slb.com (163.188.42.200), 30 hops max, 40  
> byte packets
>  1  service0-ib0.cambridge-us1089.slb.com (10.148.0.68)  0.039 ms    
> 0.034 ms   0.035 ms
>  2  bear.cambridge-us1089.slb.com (163.188.42.200)  0.136 ms    
> 0.148 ms   0.153 ms
>
> I can telnet to it on port 6444:
> r1i0n0 /opt/sge# telnet bear.cl.slb.com 6444
> Trying 163.188.42.200...
> Connected to bear.cl.slb.com.
> Escape character is '^]'.
> ^]
> telnet> quit
> Connection closed.
>
> Is there a way around this?
>
> Thanks in advance?
> Roberta
>
> ---------------------------------------------------------------------- 
> -----------------------
> Roberta M. Gigon
> Schlumberger-Doll Research
> One Hampshire Street, MD-B253
> Cambridge, MA 02139
> 617.768.2099 - phone
> 617.768.2381 - fax
>
> This message is considered Schlumberger CONFIDENTIAL.  Please treat  
> the information contained herein accordingly.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list