[GE users] Problems installing 6.0u1

Christophe Dupre duprec at scorec.rpi.edu
Wed Oct 6 02:19:37 BST 2004


The machine has no firewall configured.
I defined the services in the LDAP server so that all nodes can see 
them. 'getent services' shows the entries.
master.medusa.scorec.rpi.edu resolves to the IP address of the internal 
interface, so that computenodes will be able to access it.
I will double check the name/IP configuration tomorrow.

BTW, this was my first question to this list, and I'm overwhelmed by 
quick feedback. Thanks!

Chris Dagdigian wrote:
> These sorts of errors are almost always caused by configuration problems 
> relating to hostnames, hostname resolution and DNS configuration. SGE is 
> very very sensitive to these sorts of things.
> 
> In particular your server named "master.medusa.scorec.rpi.edu" should be 
> listed in your campus DNS server for both forward and reverse queries.
> 
> The hostname of the acting qmaster is going to be written to your 
> $SGE_ROOT/<cell>/common/act_qmaster
> 
> Your compute nodes on the private network will read that file and try to 
> contact the hostname listed. If that hostname is the public name and is 
> unreachable via the private network you are going to have issues. The 
> fix for that situation is to create a file in 
> $SGE_ROOT/<cell>/host_aliases that has an entry for 
> "master.medusa.scorec.rpi.edu <internal-hostnam>" -- this will let the 
> private nodes contact the correct IP address.
> 
> 
> If this problem is not caused by hostname/DNS issues then it could also 
> be something simple like a firewall blocking your TCP port etc.
> 
> -Chris
> 
> 
> 
> 
> 
> Christophe Dupre wrote:
> 
>> I am trying to install 6.0u1 on a cluster running RHEL 3.0. The master
>> node is attached to a private network with the compute nodes, and a 
>> public
>> network for end-users access.
>> When I try to start the daemons:
>> bash-2.05b# /etc/init.d/sgemaster  start
>>    starting sge_qmaster
>>
>> sge_qmaster didn't start!
>> Please check the messages file
>>
>>    starting sge_schedd
>> error: getting configuration: unable to contact qmaster using port 536 on
>> host "master.medusa.scorec.rpi.edu"
>> can't get configuration from qmaster -- waiting ...
>> can't get configuration from qmaster -- waiting ...
>> can't get configuration from qmaster -- waiting ...
>> error: can't get configuration from qmaster -- backgrounding
>>
>> but the messages file is empty.
>>
>> I ran the install_qmaster script and followed the instructions, but the
>> script failed while trying to start the qmaster daemon.
>>
>>
>>
>>
>> -- 
>> Christophe Dupre
>> System Administrator, Scientific Computation Research Center
>> Rensselaer Polytechnic Institute
>> Troy, NY        USA
>> Phone: (518) 276-2578  -  Fax: (518) 276-4886
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list