[GE users] high CPU load for sge_qmaster

Sean Dilda agrajag at dragaera.net
Tue May 3 12:16:10 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

christian reissmann wrote:
> Sean,
> 
> the TAG_REPORT_REQUEST messages from the qping -dump are load reports
> from the execds. Load reports are also send when a job finishes on an
> execd.
> 
> The high cpu load of the qmaster may have two reasons:
> 
> 1) your load_report_time is set to a (too) short value
>    (check it with qconf -sconf)
> 
> or
> 
> 2) a qmaster thread does not wait when nothing is to do (which was
> already discussed earlier in the mail thread)
> 
> In order to find out that point 2) applies you should wait
> for the qmaster to get into this high cpu usage condition and
> shut down all execds and do the qping -dump (as you already hinted).
> 
> Or - disable all queues (qmod -d "*") set the load_report_time to a high
> value (qconf -mconf), wait for no jobs to run and check also with qping
> -dump what's going on in your cluster.

My load report time is currently set to 00:00:40.

As far as clearing out the queues goes.. I'm pretty certain that this 
cluster has always had a job running on it since the last scheduled 
outage, which was four months ago.  And if I waited long enough for 
everything to clear out, I would have a lot of massive complaints.  And 
we won't have another scheduled outage until at least a month from now.

However, I am going to try to set my load report time up to 00:02:00 and 
see if that changes anything.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list