[GE users] SGE execution host installation issue

craffi dag at sonsorol.org
Tue Jul 7 14:58:27 BST 2009


Check the contents of $SGE_ROOT/$SGE_CELL/common/act_qmaster on your  
running qmaster host. That act_qmaster file contains the hostname of  
the qmaster as SGE sees it. This is also the file that the sge_execd's  
read in order to learn how to bind to the master.

Make sure the hostname as fully defined in act_qmaster matches exactly  
what you have in /etc/hosts on all your SGE systems.

Another debugging tip is to check out the binaries in your $SGE_ROOT/ 
utilbin/<arch>/ directory -- in particular you can run the gethostname  
and gethostbyname commands to see *exactly* how SGE sees your local  
networking environment. The programs in utilbin are what SGE itself  
uses to learn about hostname and DNS resolution.

-Chris



On Jul 7, 2009, at 9:09 AM, radical_monkey wrote:

> Im new the the Grid concept and have only just begun to experience  
> it over the last 2 weeks. My task is to set up 3 VMs running ubuntu  
> 9.04, with one functioning as a qmaster and the other two as  
> execution hosts. So far (and with great difficulty) ive managed to  
> install the qmaster on one of the VMs.
>
> Ive completed all the install pre-requisites such as setting up  
> password-less SSH between machines and setting up an NFS. The  
> problem im getting is that when I try to run the install_execd  
> script i get the following error:
>
> Checking hostname resolving
>
> ---------------------------
>
> Cannot contact qmaster. The command failed:
>
>   ./bin/lx24-x86/qconf -sh
>
> The error message was:
>
>   ERROR: unable to send message to qmaster using port 6444 on host  
> "master": can't resolve host name
>
> You can fix the problem now or abort the installation  procedure.
>
> The problem can be:
>
>   - the qmaster is not running
>
>   - the qmaster host is down
>
>   - an active firewall blocks your request
>
> Contact qmaster again (y/n) ('n' will abort) [y] >>
>
> The hostname in the /etc/hosts and /etc/hostname files is "master".  
> I can connect via SSH and the NFS is working perfectly but I cant  
> seem to find the cause of this error. Im assuming its trying to  
> access the qmaster with a different hostname than the one specified  
> but I cant find any reason why it would be doing that since the  
> hostname in the error is the same as the one on the host.
>
> This has been doing my head in for the past week so if anyone has  
> any ideas as to how I could fix this I would be very grateful!
>
> Many Thanks
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206006
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206013

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list