[GE users] new exec host grief

craffi dag at sonsorol.org
Fri Feb 13 12:23:13 GMT 2009


Hi James,

The root cause seems to be that the compute node can't get to port 701  
on host "linux6" - you should look into the standard firewall,  
routing, DNS lookup and other issues that typically can cause "can't  
get to host X, port Y" type problems.

Some suggestions:

- Double check that the exact hostname listed in $SGE_ROOT/$SGE_CELL/ 
common/act_qmaster is resolvable and that there are no typos in /etc/ 
hosts, based on your pasted output, it appears your master is called  
"linux6"

- Verify that DNS is not giving different information than /etc/hosts

- Check /tmp for log messages from the sge_execd

- Check the spool logs for minitel

- Check the process table on minitel to make sure there are not old/ 
zombie sge_execd daemons still cluttering up things

- Check the sge_qmaster spool messages file just to see if there is  
anything interesting there






On Feb 13, 2009, at 6:45 AM, lonegroover wrote:

> Hello,
>
> Trying to add a new execution host to my cluster is proving a shade
> awkward. I've added the new host to the grid according to the
> documentation, doing qconf -mq all.q, qconf -mhgrp @allhosts and
> even qconf -ah <new hostname>.
>
> I've added the relevant /etc/services entries on the new box, and
> given it the cluster name and qmaster name. It can resolve the
> qmaster name as is thanks to /etc/hosts.
>
> On the master I can see the host in the output of qstat -f, ie:
>
> all.q at minitel                  BIP   0/2       -NA-     - 
> NA-          au
>
>
> .. but starting the gridengine daemon on the new box gives:
>
> root at minitel:/etc/init.d# ./gridengine-exec start
> error: can't connect to service
> error: can't get configuration from qmaster -- backgrounding
>
> error: getting configuration: unable to contact qmaster using port  
> 701 on host "linux6"
>
>
> .. every time.
>
> Can anyone suggest a possible avenue of problem-solving opportunity?
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=104833

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list