[GE users] How to configure one execution host for two qmasters?

Colvin, Joshua jcolvin at sfwmd.gov
Mon Jul 9 19:06:39 BST 2007


Thanks Dan. Unfortunately I can install sge execd without being
prompted for port numbers (6.0u8), nothing but SGE_CELL and
SGE_ROOT are defined in my environment, and nothing is in /etc/services
for the execution node.

However the execution node is running jobs fine for qmaster #1, just
not for qmaster #2 (fails silently). If I change the ports in the 
qmaster's /etc/services file and reinstall the SW on the execution
node, the execution node can't talk to qmaster at all:

   error: commlib error: can't connect to service (Connection refused)
ERROR: unable to contact qmaster using port 536 on host "dcluster2b"

so I'm wondering where it gets these port numbers to try from? 
set|grep -i sge| returns nothing but SGE_CELL and SGE_ROOT. I've grep 
-R port in the home directory of SGE_ROOT with no luck.



-----Original Message-----
From: Dan.Templeton at Sun.COM [mailto:Dan.Templeton at Sun.COM] 
Sent: Monday, July 09, 2007 1:45 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] How to configure one execution host for two
qmasters?

Josh,

The ports are defined during installation.  Before running the install, 
you can set the SGE_QMASTER_PORT and/or SGE_EXECD_PORT to force the 
installer to use those port numbers.  Otherwise it will take the port 
numbers defined in /etc/services, or it will ask you for port numbers if

none are defined in /etc/services.

Daniel

Colvin, Joshua wrote:
>
> Hello all,
>
> I am replacing some servers and wanted to install a new parallel 
> cluster alongside the
>
> existing one. The qmasters will be different, but the execution nodes 
> (for now) will be
>
> the same. I see everything I'd expect from both qmasters (qstat -f 
> shows all the nodes
>
> I've configured for both), and I can submit jobs fine to the first 
> cluster I start, however
>
> the second sge execd process refuses to start on any execution node. I

> see no error
>
> msgs anywhere (stdout, spool, /var/log/messages), but I imagine it 
> can't bind to an
>
> already-used port, however I don't see where to define the port for 
> sge execd (not in
>
> /etc/init.d, etc...).
>
>  
>
> Is there any trick to getting one execution host to be a member of 
> multiple clusters?
>
> Thanks!
>
> Josh
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list