[GE users] SGE not freeing up client endpoints

Olle Liljenzin olle at carmen.se
Tue Mar 1 08:14:19 GMT 2005


We see this when exced is not properly stopped at shutdown.

There are two patches for the linux installation that just needs to be 
checked in to the cvs by someone with permission to do it:

http://gridengine.sunsource.net/issues/show_bug.cgi?id=1424
http://gridengine.sunsource.net/issues/show_bug.cgi?id=1426

/Olle

Sean Dilda wrote:
> I'm using SGE 6.0u3.  I had a problem where if I rebooted a compute 
> node, it would come back up before SGE had acknowledged that the node 
> was down, and sge_execd wouldn't start right.  I believe I've fixed this 
> by reducing max_unheard.  However, now when the node reboots, sge_execd 
> prints out this error when it tries to start:
> 
> 02/28/2005 16:09:06|execd|cbcb-n12|E|commlib error: endpoint is not 
> unique error (endpoint "cbcb-n12/execd/1" is already connected)
> 02/28/2005 16:09:09|execd|cbcb-n12|E|getting configuration: unable to 
> contact qmaster using port 535 on host "head4"
> 02/28/2005 16:09:09|execd|cbcb-n12|W|can't get configuration from 
> qmaster -- waiting ...
> 02/28/2005 16:09:10|execd|cbcb-n12|E|there is already a client endpoint 
> cbcb-n12/execd/1 connected to qmaster service
> 
> I will wait a few minutes after the node rebooted, and SGE is definitely 
> showing it as down, however if I try to restart sge_execd, it'll still 
> give this same error.  However, if I wait long enough (haven't timed to 
> see how long that is), I will finally be able to start sge_execd without 
> errors.
> 
> Has anyone else seen this?  Is there some reason SGE isn't freeing up 
> the endpoint?  Is there something I can do to keep from having to 
> manually restart sge_execd every time I reboot a compute node?
> 
> Thanks,
> 
> 
> Sean
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list