[GE users] gridengine/clients showq client shows wrong amount of hosts

brettlee brett.lee at sun.com
Thu Dec 31 19:15:59 GMT 2009


reuti wrote:
> Am 30.12.2009 um 22:47 schrieb brettlee:
> 
>> The output from the command below currently reports a metric derived
>> from a hard-coded TACC value for the total number of slots:
>>
>> bash$ /export/share/sge/examples/jobsbin/sol-amd64/showq
>>
>> sample output:  0 of 3936 hosts (  0.00 %)
>>
>> Seeking to gauge the level of interest in implementing this metric
>> correctly as well as possibilities of how it should be implemented.
> 
> Looks like someone desired an output like the showq of Maui.  
> Therefore it's only in the examples directory and not an official  
> tool in $SGE_ROOT/utilbin. Although it reads "0 of 3936 hosts" it's  
> slots and not hosts in real I think.
> 
Yes.
> What would you like to get here? The problem is, that SGE is flexible  
> as you can have multiple queues/slots per machine. One output you  
> could use is the one from:
> 
> $ qstat -g c

Good question.  In fact, that is exactly what I am trying to determine 
with help from the community.

My belief is that the current output should either be removed or 
modified to produce an accurate metric, regardless of whether (as was 
pointed out) it is in the examples directory.

If the output is to be modified, exactly what metric should be reported? 
  And if some metric is to be reported, does it get added to qstat or 
does it simply remain in showq in the examples directory?  Then there is 
the issue of the flexibility in SGE, other sufficient methods to already 
obtain the metric (qstat, ARCo).  As pointed out, "qstat -g c" produces 
a similar (possibly even identical - checked the code but couldn't tell) 
metric.

IMO, the % util seemed like a good data point to track over time, but 
I've begun to wonder what does it really tell us?  I recall Dan T 
discussing something similar to this, with regard to measuring data 
points and actually getting more jobs running in the queue and 
completing faster.  For example, does the % util indicate we have a 
bottleneck somewhere that can be addressed with more CPU slots?  or more 
of something else?  Or is it simply a good metric to indicate the value 
and usage of the system over time?

One argument against this metric (active jobs / total slots in queue) 
would be that the "understood" data point would seem to be skewed and 
thus lose reliability when slots were unavailable for execution in the 
queue (down for service or otherwise allocated).

Thus this open question to the community about the desired course of action.

Reuti - Thanks for the details below on Torque.  It helped me connect 
the dots when drawing a distinction between Torque and Maui.
> 
> I'm not sure, how fix the assignment of queues/slots/hosts in Maui/ 
> Torque* is and whether it's more suited for such an enviroment to  
> have this output.
> 
> -- Reuti
> 
> *) Torque is another queuingsystem and needs an external scheduler  
> like Maui or Moab for a more sophisticated setup.
> 
> 
>> Details can be found at:
>>
>> http://gridengine.sunsource.net/issues/show_bug.cgi?id=3218
>>
>> Thanks!  -Brett
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=235675
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=235692
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=235817

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list