[GE users] Issues with sge_commd

Sean Dilda agrajag at dragaera.net
Thu Jul 15 18:59:04 BST 2004


On Thu, 2004-07-15 at 13:30, Bernard Li wrote:
> Hi list:
> 
> Recently we have been having some issues with high load on our headnode
> with sge_commd stuck at 99%.  Occassionally I can do a softstop and
> bring up rcsge again and that would solve the problem.  However, most of
> the time sge_commd continues to be stuck.

Run netstat, is commd making a lot of connections to itself?  If so, you
should check if any of your nodes has its hostname set to 'localhost'. 
I had a node like that.  It caused commd to open connections to
'localhost', which meant it was talking to itself and ended up running
its own CPU usage up to 100%, and causing timeouts when trying to
connect to the master.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list