[GE users] commd infinite loop?

Sean Dilda agrajag at dragaera.net
Tue Jun 1 16:12:16 BST 2004


On Sun, 2004-05-30 at 23:49, Ken McMillan wrote:
> I've just install SGE 5.3p4 and, while it seems to work, sge_commd is
> using 100% CPU on the master host, while the system is idle (it is
> using negligible CPU on the other hosts). Does anyone have an
> idea of what might cause this?

I'm not sure if its the same thing, but there is a case I know of where
commd will shoot to 100% CPU usage.  Try running netstat.  Does it look
like commd is opening a bunch of connections to itself (ie to
localhost).  If so, you're probably hitting the same thing.

It seems that when sge_execd starts, it gets the hostname of the local
machine, send that to sge_qmaster.  Then when qmaster wants to talk to
that machine, commd ends up opening a connection to whatever that
hostname is.  So, if one of your compute nodes has its hostname set to
'localhost', then it'll send that to the qmaster, and commd will start
opening connections to localhost.  Since commd starts talking to itself
and opening hundreds of connections, it shoots up to 100% CPU usage.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list