[GE users] commd infinite loop?

Ken McMillan mcmillan at cadence.com
Wed Jun 2 00:34:21 BST 2004


On Tue, 1 Jun 2004, Sean Dilda wrote:

> On Sun, 2004-05-30 at 23:49, Ken McMillan wrote:
> > I've just install SGE 5.3p4 and, while it seems to work, sge_commd is
> > using 100% CPU on the master host, while the system is idle (it is
> > using negligible CPU on the other hosts). Does anyone have an
> > idea of what might cause this?
> 
> I'm not sure if its the same thing, but there is a case I know of where
> commd will shoot to 100% CPU usage.  Try running netstat.  Does it look
> like commd is opening a bunch of connections to itself (ie to
> localhost).  If so, you're probably hitting the same thing.
> 
> It seems that when sge_execd starts, it gets the hostname of the local
> machine, send that to sge_qmaster.  Then when qmaster wants to talk to
> that machine, commd ends up opening a connection to whatever that
> hostname is.  So, if one of your compute nodes has its hostname set to
> 'localhost', then it'll send that to the qmaster, and commd will start
> opening connections to localhost.  Since commd starts talking to itself
> and opening hundreds of connections, it shoots up to 100% CPU usage.
> 

In fact, "localhost" was turning up in the list of execution
hosts produced by qhost. I don't know why this happened, but
after re-installing SGE it went away, and so did the problem
with 100% CPU usage by commd.

	Thanks!

	Ken


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list