[GE users] SGE and TruCluster

James Chamberlain jamesc at exa.com
Tue May 9 05:21:36 BST 2006


That's the very problem I've been working on, Rayson.  Any idea how to get 
sge_execd to bind to the local IP?  I haven't seen an option for sge_execd 
which lets me specify which IP to bind to, but have been trying to 
investigate on the assumption that a facility to do this exists within 
TruCluster.  SGE can't be the only thing affected by an issue like this.

Thanks,

James

On Mon, 8 May 2006, Rayson Ho wrote:

> May be you got everything working by now?? :)
>
> Anyway, I used to work with TruClusters a few years ago, and I
> remember there are 2 IPs on each node (one is local to the node, and
> one is to address the entire cluster). If you want to have an SGE
> cluster running within the TruCluster, then you should tell each execd
> to bind to the IP address that is local to its node.
>
> I think I need to read the TruCluster docs to refresh my memory :p,
> but do let me know if you get any further...
>
> Rayson
>
>
>
>
> On 5/5/06, James Chamberlain <jamesc at exa.com> wrote:
>> Hi Fred,
>> 
>> Thanks for the feedback.
>> 
>> TruCluster is an interesting environment.  It seems to be set up for load
>> balancing by default, with a single root filesystem shared amongst all its
>> members.  Any changes you make on one node to most of the files in /etc
>> immediately show up on all other nodes.
>> 
>> I'm hoping there may be a less drastic way around this than to patch the
>> code.  I notice that inetd, ntpd, apache and other network services all are
>> listening on the wildcard address on all nodes with no problem.  Sure, HP
>> could have modified them all to be cluster aware, but I'd like to believe
>> there's a way for unmodified apps to bind to the wildcard address, too.
>> 
>> Thanks,
>> 
>> James
>> 
>> On Fri, 5 May 2006, Fred L Youhanaie wrote:
>> 
>> >
>> > Hi James,
>> >
>> > James Chamberlain wrote:
>> >> Hi all,
>> >>
>> >> Does anyone know of a way to make sge_execd bind to a specific 
>> interface?
>> >> By binding to *:536, sge_execd is picking up the cluster IP address in a
>> >> TruCluster (Tru64 UNIX) environment.  As a result, other nodes in the
>> >> TruCluster say they can't start sge_execd because port 536 is already in
>> >> use. For reference, I'm using SGE 5.3p7.
>> >
>> > I just had a look at the source, it appears that execd ultimately calls
>> > cl_com_tcp_connection_request_handler_setup(), which in turn binds to the
>> > 'wildcard' address.
>> >
>> > source/libs/comm/cl_tcp_framework.c:
>> > =====================
>> > int cl_com_tcp_connection_request_handler_setup(...)
>> > ...
>> >   /* bind an address to socket */
>> >   ...
>> >   serv_addr.sin_addr.s_addr = htonl(INADDR_ANY);
>> >   ...
>> >   if (bind(sockfd, (struct sockaddr *) &serv_addr, ...
>> > =====================
>> >
>> > So, you will need to file an RFE, or patch it yourself!
>> >
>> > Cheers
>> > f.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> 
>> 
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list