[GE users] gridengine/clients showq client shows wrong amount of hosts

templedf dan.templeton at sun.com
Thu Dec 31 19:40:54 GMT 2009


I think the % utilized slots is a useful metric, but as you said, it's 
already available from qstat -g c.  Does it really need to be included 
in showq as well?  Maybe?  Anything to simplify the admin's life is good.

The % utilized hosts is what showq apparently intends to report, but is 
that a useful metric?  It might be.  It could be used to watch that the 
scheduler is behaving correctly, for example not piling jobs up on a few 
hosts instead of spreading them out.  In general, though, that's only 
useful after a configuration change.  It would also be a pain to 
calculate from what GDI will tell you.

The # used slots / # hosts metric that showq actually reports is useless 
as far as I can tell, even if the number of hosts weren't hard-coded to 
TACC's value.

I think my recommendation would be to report the % slots utilized and 
file an RFE to add the % hosts utilized in case someone's feeling 
ambitious. :)

Daniel

brettlee wrote:
> reuti wrote:
>   
>> Am 30.12.2009 um 22:47 schrieb brettlee:
>>
>>     
>>> The output from the command below currently reports a metric derived
>>> from a hard-coded TACC value for the total number of slots:
>>>
>>> bash$ /export/share/sge/examples/jobsbin/sol-amd64/showq
>>>
>>> sample output:  0 of 3936 hosts (  0.00 %)
>>>
>>> Seeking to gauge the level of interest in implementing this metric
>>> correctly as well as possibilities of how it should be implemented.
>>>       
>> Looks like someone desired an output like the showq of Maui.  
>> Therefore it's only in the examples directory and not an official  
>> tool in $SGE_ROOT/utilbin. Although it reads "0 of 3936 hosts" it's  
>> slots and not hosts in real I think.
>>
>>     
> Yes.
>   
>> What would you like to get here? The problem is, that SGE is flexible  
>> as you can have multiple queues/slots per machine. One output you  
>> could use is the one from:
>>
>> $ qstat -g c
>>     
>
> Good question.  In fact, that is exactly what I am trying to determine 
> with help from the community.
>
> My belief is that the current output should either be removed or 
> modified to produce an accurate metric, regardless of whether (as was 
> pointed out) it is in the examples directory.
>
> If the output is to be modified, exactly what metric should be reported? 
>   And if some metric is to be reported, does it get added to qstat or 
> does it simply remain in showq in the examples directory?  Then there is 
> the issue of the flexibility in SGE, other sufficient methods to already 
> obtain the metric (qstat, ARCo).  As pointed out, "qstat -g c" produces 
> a similar (possibly even identical - checked the code but couldn't tell) 
> metric.
>
> IMO, the % util seemed like a good data point to track over time, but 
> I've begun to wonder what does it really tell us?  I recall Dan T 
> discussing something similar to this, with regard to measuring data 
> points and actually getting more jobs running in the queue and 
> completing faster.  For example, does the % util indicate we have a 
> bottleneck somewhere that can be addressed with more CPU slots?  or more 
> of something else?  Or is it simply a good metric to indicate the value 
> and usage of the system over time?
>
> One argument against this metric (active jobs / total slots in queue) 
> would be that the "understood" data point would seem to be skewed and 
> thus lose reliability when slots were unavailable for execution in the 
> queue (down for service or otherwise allocated).
>
> Thus this open question to the community about the desired course of action.
>
> Reuti - Thanks for the details below on Torque.  It helped me connect 
> the dots when drawing a distinction between Torque and Maui.
>   
>> I'm not sure, how fix the assignment of queues/slots/hosts in Maui/ 
>> Torque* is and whether it's more suited for such an enviroment to  
>> have this output.
>>
>> -- Reuti
>>
>> *) Torque is another queuingsystem and needs an external scheduler  
>> like Maui or Moab for a more sophisticated setup.
>>
>>
>>     
>>> Details can be found at:
>>>
>>> http://gridengine.sunsource.net/issues/show_bug.cgi?id=3218
>>>
>>> Thanks!  -Brett
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>> dsForumId=38&dsMessageId=235675
>>>
>>> To unsubscribe from this discussion, e-mail: [users- 
>>> unsubscribe at gridengine.sunsource.net].
>>>       
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=235692
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>     
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=235817
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=235823

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list