[GE users] high CPU load for sge_qmaster

Stephan Grell - Sun Germany - SSG - Software Engineer stephan.grell at sun.com
Mon May 9 15:13:13 BST 2005


Hi Sean,

we got a report, which sounds very similar to yours. Do you have any commlib
error messages in your qmaster messages file? It could be,that the high CPU
load is triggered by broken connections. Just an assumption. Do you have
data
do back up this idea?

Cheers,
Stephan

Sean Dilda wrote:

>christian reissmann wrote:
>  
>
>>Hi Sean,
>>
>>it might be possible that one or more execd overwrite the default
>>load_report_time. You can check this with qconf:
>>
>>qconf -sconf EXECD_HOSTNAME | grep load_report_time
>>
>>in combination with qping -dump you can try to isolate hosts which
>>very often send message to qmaster.
>>    
>>
>
>I have no local configurations.
>
>
>[sean at head4 sean]$ ls -l $SGE_ROOT/default/common/local_conf
>total 0
>[sean at head4 sean]$
>
>
>I ended up restarting my sge_qmaster.  It hasn't started using the high 
>CPU load yet (although I expect it will within the next 24 hours).  I 
>went ahead and ran the -dump for a minute while it wasn't using much CPU 
>time.  There were 644 messages in that minute.  When qmaster was at full 
>load, I got 3016 messages in 5 minutes, which comes out to 603 messages 
>per minute.
>
>So, after restarting, I'm getting just as many messages, and averaging 
>less than 1% cpu usage. (as opposed to being constantly at 100% cpu usage).
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list