[GE users] problem with install a new SGE execution host

Tim Harsch harsch1 at llnl.gov
Thu Feb 17 00:50:06 GMT 2005


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Is there a reason why you've chosen 536 for your master and 537 for your execd?  The defaults are 535 and 536, respectively...

----- Original Message ----- 
  From: Chunyan Wang 
  To: users at gridengine.sunsource.net 
  Sent: Wednesday, February 16, 2005 4:14 PM
  Subject: Re: [GE users] problem with install a new SGE execution host


  The "execd" was running on sge-a host yesterday after installed, but now is not running, the result looks like this:

  (wangc)$ ps -eaf |grep sge_execd
     wangc   972   415  0 20:04:57 pts/2    0:00 grep sge_execd

  [ sge-a:/opt/n1ge6/utilbin/sol-sparc64 ]
  (wangc)$ 

  I use "qping -info sge-a 536 execd 1" to check on the master host, then I get the result:

  coe01:/export/data/web/moby/cgi-bin 195 % qping -info sge-a 536 execd 1
  endpoint sge-a/execd/1 at port 536: can't find connection

  I also use "telnet master 536", then I get the result:

  [ sge-a:/export/home/wangc/load-sensors ]
  (wangc)$ telnet coe01.ucalgary.ca 536
  Trying 136.159.169.6...
  Connected to coe01.
  Escape character is '^]'.

  So, port 536 is open.  But I don't know why execd on sge-a cannot connect to the master host. Could anyone tell me what do I need to check next?

  Thanks,

  Joyce

  Tim Harsch wrote:

find and kill all sge_execd's on that host, rerun
$SGE_ROOT/defalut/common/sgeexecd as root, verify it starts via grepping ps.

----- Original Message ----- 
From: "Chunyan Wang" <wangch at cpsc.ucalgary.ca>
To: <users at gridengine.sunsource.net>
Sent: Wednesday, February 16, 2005 12:19 PM
Subject: [GE users] problem with install a new SGE execution host


  Hi all,
I have sge6.3 running. I want to install another execution host on sge-a
host. I run install_execd script on sge-a. We share $SGE_ROOT to all
hosts. I created a queue for sge-a, and the queue is in "au" state, this
means no report information from sge-a host. I checked execd is not
running on sge-a host. I found an error message on sge-a host:
[ sge-a:/tmp ]
(wangc)$ ls
execd_messages.300  execd_messages.571  execd_messages.637
execd_messages.699

[ sge-a:/tmp ]
(wangc)$ cat execd_messages.637
02/15/2005 19:52:49|execd|sge-a|C|can't create execd handle for "execd"
with id 1, using port 537
02/15/2005 19:52:50|execd|sge-a|C|can't create execd handle for "execd"
with id 1, using port 537
02/15/2005 19:52:51|execd|sge-a|C|can't create execd handle for "execd"
with id 1, using port 537
02/15/2005 19:52:52|execd|sge-a|C|can't create execd handle for "execd"
with id 1, using port 537
02/15/2005 19:52:53|execd|sge-a|C|can't create execd handle for "execd"
with id 1, using port 537

Port 536 and 537 are open. root access on sge-a.
I check the discussion list, and found someone suggested use local spool
directory for the new exection host.
Any suggestions about this problem?

Thanks alot!

Joyce


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

    

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
  



More information about the gridengine-users mailing list