[GE users] problem with install a new SGE execution host

Chunyan Wang wangch at cpsc.ucalgary.ca
Thu Feb 17 18:16:06 GMT 2005


The SGE 6.0 documentation says to use 536/tcp for sge_qmaster and 
537/tcp for sge_execd in Chapter - Installing the Grid Engine Software 
Interactively.

Joyce

Tim Harsch wrote:

> Is there a reason why you've chosen 536 for your master and 537 for 
> your execd?  The defaults are 535 and 536, respectively...
>  
> ----- Original Message -----
>
>     From: Chunyan Wang <mailto:wangch at cpsc.ucalgary.ca>
>     To: users at gridengine.sunsource.net
>     <mailto:users at gridengine.sunsource.net>
>     Sent: Wednesday, February 16, 2005 4:14 PM
>     Subject: Re: [GE users] problem with install a new SGE execution host
>
>     The "execd" was running on sge-a host yesterday after installed,
>     but now is not running, the result looks like this:
>
>     (wangc)$ ps -eaf |grep sge_execd
>        wangc   972   415  0 20:04:57 pts/2    0:00 grep sge_execd
>
>     [ sge-a:/opt/n1ge6/utilbin/sol-sparc64 ]
>     (wangc)$
>
>     I use "qping -info sge-a 536 execd 1" to check on the master host,
>     then I get the result:
>
>     coe01:/export/data/web/moby/cgi-bin 195 % qping -info sge-a 536
>     execd 1
>     endpoint sge-a/execd/1 at port 536: can't find connection
>
>     I also use "telnet master 536", then I get the result:
>
>     [ sge-a:/export/home/wangc/load-sensors ]
>     (wangc)$ telnet coe01.ucalgary.ca 536
>     Trying 136.159.169.6...
>     Connected to coe01.
>     Escape character is '^]'.
>
>     So, port 536 is open.  But I don't know why execd on sge-a cannot
>     connect to the master host. Could anyone tell me what do I need to
>     check next?
>
>     Thanks,
>
>     Joyce
>
>     Tim Harsch wrote:
>
>>find and kill all sge_execd's on that host, rerun
>>$SGE_ROOT/defalut/common/sgeexecd as root, verify it starts via grepping ps.
>>
>>----- Original Message ----- 
>>From: "Chunyan Wang" <wangch at cpsc.ucalgary.ca>
>>To: <users at gridengine.sunsource.net>
>>Sent: Wednesday, February 16, 2005 12:19 PM
>>Subject: [GE users] problem with install a new SGE execution host
>>
>>
>>  
>>
>>>Hi all,
>>>I have sge6.3 running. I want to install another execution host on sge-a
>>>host. I run install_execd script on sge-a. We share $SGE_ROOT to all
>>>hosts. I created a queue for sge-a, and the queue is in "au" state, this
>>>means no report information from sge-a host. I checked execd is not
>>>running on sge-a host. I found an error message on sge-a host:
>>>[ sge-a:/tmp ]
>>>(wangc)$ ls
>>>execd_messages.300  execd_messages.571  execd_messages.637
>>>execd_messages.699
>>>
>>>[ sge-a:/tmp ]
>>>(wangc)$ cat execd_messages.637
>>>02/15/2005 19:52:49|execd|sge-a|C|can't create execd handle for "execd"
>>>with id 1, using port 537
>>>02/15/2005 19:52:50|execd|sge-a|C|can't create execd handle for "execd"
>>>with id 1, using port 537
>>>02/15/2005 19:52:51|execd|sge-a|C|can't create execd handle for "execd"
>>>with id 1, using port 537
>>>02/15/2005 19:52:52|execd|sge-a|C|can't create execd handle for "execd"
>>>with id 1, using port 537
>>>02/15/2005 19:52:53|execd|sge-a|C|can't create execd handle for "execd"
>>>with id 1, using port 537
>>>
>>>Port 536 and 537 are open. root access on sge-a.
>>>I check the discussion list, and found someone suggested use local spool
>>>directory for the new exection host.
>>>Any suggestions about this problem?
>>>
>>>Thanks alot!
>>>
>>>Joyce
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>    
>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>  
>>
>




More information about the gridengine-users mailing list