[GE users] problem with install a new SGE execution host

Charu Chaubal Charu.Chaubal at Sun.COM
Thu Feb 17 18:20:13 GMT 2005


The numbers used for these two ports can be anything, as long as it
doesn't conflict with any existing service (both in /etc/services as
well as possibly other places, like NIS).

Even the "defaults" 535/536/537 are not official, since they are not
registered.

Regards,
	Charu


Chunyan Wang wrote:
> The SGE 6.0 documentation says to use 536/tcp for sge_qmaster and
> 537/tcp for sge_execd in Chapter - Installing the Grid Engine Software
> Interactively.
> 
> Joyce
> 
> Tim Harsch wrote:
> 
>> Is there a reason why you've chosen 536 for your master and 537 for
>> your execd?  The defaults are 535 and 536, respectively...
>>  
>> ----- Original Message -----
>>
>>     *From:* Chunyan Wang <mailto:wangch at cpsc.ucalgary.ca>
>>     *To:* users at gridengine.sunsource.net
>>     <mailto:users at gridengine.sunsource.net>
>>     *Sent:* Wednesday, February 16, 2005 4:14 PM
>>     *Subject:* Re: [GE users] problem with install a new SGE execution
>>     host
>>
>>     The "execd" was running on sge-a host yesterday after installed,
>>     but now is not running, the result looks like this:
>>
>>     (wangc)$ ps -eaf |grep sge_execd
>>        wangc   972   415  0 20:04:57 pts/2    0:00 grep sge_execd
>>
>>     [ sge-a:/opt/n1ge6/utilbin/sol-sparc64 ]
>>     (wangc)$
>>
>>     I use "qping -info sge-a 536 execd 1" to check on the master host,
>>     then I get the result:
>>
>>     coe01:/export/data/web/moby/cgi-bin 195 % qping -info sge-a 536
>>     execd 1
>>     endpoint sge-a/execd/1 at port 536: can't find connection
>>
>>     I also use "telnet master 536", then I get the result:
>>
>>     [ sge-a:/export/home/wangc/load-sensors ]
>>     (wangc)$ telnet coe01.ucalgary.ca 536
>>     Trying 136.159.169.6...
>>     Connected to coe01.
>>     Escape character is '^]'.
>>
>>     So, port 536 is open.  But I don't know why execd on sge-a cannot
>>     connect to the master host. Could anyone tell me what do I need to
>>     check next?
>>
>>     Thanks,
>>
>>     Joyce
>>
>>     Tim Harsch wrote:
>>
>>>find and kill all sge_execd's on that host, rerun
>>>$SGE_ROOT/defalut/common/sgeexecd as root, verify it starts via grepping ps.
>>>
>>>----- Original Message ----- 
>>>From: "Chunyan Wang" <wangch at cpsc.ucalgary.ca>
>>>To: <users at gridengine.sunsource.net>
>>>Sent: Wednesday, February 16, 2005 12:19 PM
>>>Subject: [GE users] problem with install a new SGE execution host
>>>
>>>
>>>  
>>>
>>>>Hi all,
>>>>I have sge6.3 running. I want to install another execution host on sge-a
>>>>host. I run install_execd script on sge-a. We share $SGE_ROOT to all
>>>>hosts. I created a queue for sge-a, and the queue is in "au" state, this
>>>>means no report information from sge-a host. I checked execd is not
>>>>running on sge-a host. I found an error message on sge-a host:
>>>>[ sge-a:/tmp ]
>>>>(wangc)$ ls
>>>>execd_messages.300  execd_messages.571  execd_messages.637
>>>>execd_messages.699
>>>>
>>>>[ sge-a:/tmp ]
>>>>(wangc)$ cat execd_messages.637
>>>>02/15/2005 19:52:49|execd|sge-a|C|can't create execd handle for "execd"
>>>>with id 1, using port 537
>>>>02/15/2005 19:52:50|execd|sge-a|C|can't create execd handle for "execd"
>>>>with id 1, using port 537
>>>>02/15/2005 19:52:51|execd|sge-a|C|can't create execd handle for "execd"
>>>>with id 1, using port 537
>>>>02/15/2005 19:52:52|execd|sge-a|C|can't create execd handle for "execd"
>>>>with id 1, using port 537
>>>>02/15/2005 19:52:53|execd|sge-a|C|can't create execd handle for "execd"
>>>>with id 1, using port 537
>>>>
>>>>Port 536 and 537 are open. root access on sge-a.
>>>>I check the discussion list, and found someone suggested use local spool
>>>>directory for the new exection host.
>>>>Any suggestions about this problem?
>>>>
>>>>Thanks alot!
>>>>
>>>>Joyce
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>    
>>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>  
>>>
>>
> 

-- 
####################################################################
# Charu V. Chaubal              # Phone: (650) 786-7672 (x87672)   #
# Grid Computing Technologist   # Fax:   (650) 786-4591            #
# Sun Microsystems, Inc.        # Email: charu.chaubal at sun.com     #
####################################################################


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list