[GE users] sge_qmaster use 99.9% CPU

Simon Gao gao at schrodinger.com
Wed May 21 18:29:51 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Ravi Chandra Nallan wrote:
> Hi,
> What version of SGE are you using? Any hints in the qmaster messages 
> file?
>
> regards,
> ~Ravi
> Simon Gao wrote:
>> Hi,
>>
>> Just notices that sge_qmaster has been constantly running 99.9% of 
>> CPU time. What are the factors that may contribute such high CPU 
>> usage by sge_qmaster? Where to look to find what's going on?
>>
SGE 6.0u6.

I am not sure if following messages are related:

05/20/2008 15:17:01|qmaster|cluster|I|qmaster hard descriptor limit is 
set to 1024
05/20/2008 15:17:01|qmaster|cluster|I|qmaster soft descriptor limit is 
set to 1024
05/20/2008 15:17:01|qmaster|cluster|I|qmaster will use max. 1004 file 
descriptors for communication
05/20/2008 15:17:01|qmaster|cluster|I|qmaster will accept max. 99 
dynamic event clients
05/20/2008 15:17:01|qmaster|cluster|I|starting up 6.0u6
05/20/2008 15:17:01|qmaster|cluster|W|FD_SETSIZE is limited to 1024 file 
descriptors on this system.
05/20/2008 15:17:01|qmaster|cluster|W|If you want to support more than 
1004 qmaster clients you have to
05/20/2008 15:17:01|qmaster|cluster|W|recompile the source code with a 
higher FD_SETSIZE setting.
05/20/2008 15:17:01|qmaster|cluster|W|Bug Link: 
http://gridengine.sunsource.net/issues/show_bug.cgi?id=1502

05/20/2008 16:20:01|qmaster|cluster|E|commlib error: got send timeout 
(closing "clustersub.company.com/qstat/2065")
05/20/2008 16:20:01|qmaster|cluster|E|commlib error: got send timeout 
(closing "clustersub.company.com/qstat/2064")

05/20/2008 19:14:49|qmaster|cluster|E|can't send asynchronous message to 
commproc (qstat:2398) on host "clustersub.company.com": can't send 
response for this message id - protocol error
05/20/2008 19:14:49|qmaster|cluster|E|can't send asynchronous message to 
commproc (qstat:2397) on host "clustersub.company.com": can't send 
response for this message id - protocol error


Besides a main head node, cluster, we also have a submission node, 
clustersub, from which users can submit jobs.

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list