[GE users] commd infinite loop?

Ken McMillan mcmillan at cadence.com
Mon May 31 04:49:27 BST 2004

I've just install SGE 5.3p4 and, while it seems to work, sge_commd is
using 100% CPU on the master host, while the system is idle (it is
using negligible CPU on the other hosts). Does anyone have an
idea of what might cause this?

I tried strace on sge_commd, and saw the following repeating

gettimeofday({1085965624, 701962}, {420, 0}) = 0
select(1024, [3 4 5 6 8 10], [10], NULL, {10, 0}) = 1 (out [10], left {10, 0})
read(10, 0x806e036, 1)                  = -1 EAGAIN (Resource temporarily unavailable)

Now, if I understand correctly, read() only returns EAGAIN in the case
of a non-blocking fd, when there is no data.  Moreover, according to
the man page, select() returns all the file descriptors that will not
block.  So it doesn't seem to me to make any sense to use a non-blocking
fd with select(). 

So maybe there is some other daemon that is not responding, causing
sge_commd to poll the socket infinitely?

	Thanks for any help -- Ken McMillan

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list