[GE users] high CPU load for sge_qmaster

Stephan Grell - Sun Germany - SSG - Software Engineer stephan.grell at sun.com
Mon May 2 09:25:19 BST 2005


Hi,
this could mean two things:

- somehow the wait for a message to process does not work and the MT
thread goes wild

- you have incomming messages.

You can check for incomming messages via qping -dump

Could you please post that output. The jumping EDT time sugests,that
there are at least some messages going forth and back.

You can also test, if the bahavior changes, when you are shuting down
the execds.

Stephan

Sean Dilda wrote:

>Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
>  
>
>>You are right, there are no jobs in the system. Could you monitor the 
>>qping output? Is the MT: allways that low?
>>If there is nothing to do, I would except higher times than 0.4.
>>When the system is idel, as yours are, the number should be similar to;:
>>
>>EDT:R(x) ~0.9
>>TET:R(x) > 1
>>MT:R(x) > 1
>>
>>Do you know what triggers this behavior?
>>What operating system are you using?
>>    
>>
>
>I ran qping with '-i 10 -f' for a while.  EDT seemed to bounce around, 
>always > 0.00 and < 1.00.   TET bounced around, just as likely to be 
>above 1 as below it.  And MT stayed at 0.04 the whole time.  This system 
>is running CentOS 3, which is essentially RHEL3.
>
>I have a much smaller test cluster running the same OS and the same SGE 
>binaries.  Although at one point I spent a good amount of time trying to 
>reproduce this there, I've been unable to reproduce the problem on the 
>test cluster.  I've tried all the configuration options I could think 
>of.  The same qping command on that box tended to have a similar EDT to 
>my big cluster.  The TET bounced around a bit, but was almost always 
>above 1.  It had an MT that bounced around as well, but tended to stay 
>under 1 the whole time.
>
>It looks like some jobs did exit on my big cluster while I was doing 
>this.   I know for certain that no jobs were submitted or even running 
>on my test cluster during this.
>
>I really have no idea what triggers this.  My big cluster has been in 
>this state for most of a month.  I tried to restart sge_qmaster a couple 
>of times to see if it would go away, but that never worked.
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list