[GE users] SGE and TruCluster

Rayson Ho rayrayson at gmail.com
Tue May 9 00:18:29 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

May be you got everything working by now?? :)

Anyway, I used to work with TruClusters a few years ago, and I
remember there are 2 IPs on each node (one is local to the node, and
one is to address the entire cluster). If you want to have an SGE
cluster running within the TruCluster, then you should tell each execd
to bind to the IP address that is local to its node.

I think I need to read the TruCluster docs to refresh my memory :p,
but do let me know if you get any further...

Rayson




On 5/5/06, James Chamberlain <jamesc at exa.com> wrote:
> Hi Fred,
>
> Thanks for the feedback.
>
> TruCluster is an interesting environment.  It seems to be set up for load
> balancing by default, with a single root filesystem shared amongst all its
> members.  Any changes you make on one node to most of the files in /etc
> immediately show up on all other nodes.
>
> I'm hoping there may be a less drastic way around this than to patch the
> code.  I notice that inetd, ntpd, apache and other network services all are
> listening on the wildcard address on all nodes with no problem.  Sure, HP
> could have modified them all to be cluster aware, but I'd like to believe
> there's a way for unmodified apps to bind to the wildcard address, too.
>
> Thanks,
>
> James
>
> On Fri, 5 May 2006, Fred L Youhanaie wrote:
>
> >
> > Hi James,
> >
> > James Chamberlain wrote:
> >> Hi all,
> >>
> >> Does anyone know of a way to make sge_execd bind to a specific interface?
> >> By binding to *:536, sge_execd is picking up the cluster IP address in a
> >> TruCluster (Tru64 UNIX) environment.  As a result, other nodes in the
> >> TruCluster say they can't start sge_execd because port 536 is already in
> >> use. For reference, I'm using SGE 5.3p7.
> >
> > I just had a look at the source, it appears that execd ultimately calls
> > cl_com_tcp_connection_request_handler_setup(), which in turn binds to the
> > 'wildcard' address.
> >
> > source/libs/comm/cl_tcp_framework.c:
> > =====================
> > int cl_com_tcp_connection_request_handler_setup(...)
> > ...
> >   /* bind an address to socket */
> >   ...
> >   serv_addr.sin_addr.s_addr = htonl(INADDR_ANY);
> >   ...
> >   if (bind(sockfd, (struct sockaddr *) &serv_addr, ...
> > =====================
> >
> > So, you will need to file an RFE, or patch it yourself!
> >
> > Cheers
> > f.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list