[GE users] gridengine/clients showq client shows wrong amount of hosts

templedf dan.templeton at sun.com
Thu Dec 31 19:40:54 GMT 2009

I think the % utilized slots is a useful metric, but as you said, it's 
already available from qstat -g c.  Does it really need to be included 
in showq as well?  Maybe?  Anything to simplify the admin's life is good.

The % utilized hosts is what showq apparently intends to report, but is 
that a useful metric?  It might be.  It could be used to watch that the 
scheduler is behaving correctly, for example not piling jobs up on a few 
hosts instead of spreading them out.  In general, though, that's only 
useful after a configuration change.  It would also be a pain to 
calculate from what GDI will tell you.

The # used slots / # hosts metric that showq actually reports is useless 
as far as I can tell, even if the number of hosts weren't hard-coded to 
TACC's value.

I think my recommendation would be to report the % slots utilized and 
file an RFE to add the % hosts utilized in case someone's feeling 
ambitious. :)


brettlee wrote:
> reuti wrote:
>> Am 30.12.2009 um 22:47 schrieb brettlee:
>>> The output from the command below currently reports a metric derived
>>> from a hard-coded TACC value for the total number of slots:
>>> bash$ /export/share/sge/examples/jobsbin/sol-amd64/showq
>>> sample output:  0 of 3936 hosts (  0.00 %)
>>> Seeking to gauge the level of interest in implementing this metric
>>> correctly as well as possibilities of how it should be implemented.
>> Looks like someone desired an output like the showq of Maui.  
>> Therefore it's only in the examples directory and not an official  
>> tool in $SGE_ROOT/utilbin. Although it reads "0 of 3936 hosts" it's  
>> slots and not hosts in real I think.
> Yes.
>> What would you like to get here? The problem is, that SGE is flexible  
>> as you can have multiple queues/slots per machine. One output you  
>> could use is the one from:
>> $ qstat -g c
> Good question.  In fact, that is exactly what I am trying to determine 
> with help from the community.
> My belief is that the current output should either be removed or 
> modified to produce an accurate metric, regardless of whether (as was 
> pointed out) it is in the examples directory.
> If the output is to be modified, exactly what metric should be reported? 
>   And if some metric is to be reported, does it get added to qstat or 
> does it simply remain in showq in the examples directory?  Then there is 
> the issue of the flexibility in SGE, other sufficient methods to already 
> obtain the metric (qstat, ARCo).  As pointed out, "qstat -g c" produces 
> a similar (possibly even identical - checked the code but couldn't tell) 
> metric.
> IMO, the % util seemed like a good data point to track over time, but 
> I've begun to wonder what does it really tell us?  I recall Dan T 
> discussing something similar to this, with regard to measuring data 
> points and actually getting more jobs running in the queue and 
> completing faster.  For example, does the % util indicate we have a 
> bottleneck somewhere that can be addressed with more CPU slots?  or more 
> of something else?  Or is it simply a good metric to indicate the value 
> and usage of the system over time?
> One argument against this metric (active jobs / total slots in queue) 
> would be that the "understood" data point would seem to be skewed and 
> thus lose reliability when slots were unavailable for execution in the 
> queue (down for service or otherwise allocated).
> Thus this open question to the community about the desired course of action.
> Reuti - Thanks for the details below on Torque.  It helped me connect 
> the dots when drawing a distinction between Torque and Maui.
>> I'm not sure, how fix the assignment of queues/slots/hosts in Maui/ 
>> Torque* is and whether it's more suited for such an enviroment to  
>> have this output.
>> -- Reuti
>> *) Torque is another queuingsystem and needs an external scheduler  
>> like Maui or Moab for a more sophisticated setup.
>>> Details can be found at:
>>> http://gridengine.sunsource.net/issues/show_bug.cgi?id=3218
>>> Thanks!  -Brett
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>> dsForumId=38&dsMessageId=235675
>>> To unsubscribe from this discussion, e-mail: [users- 
>>> unsubscribe at gridengine.sunsource.net].
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=235692
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=235817
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list