[GE users] Is qstat -g c reporting the wrong results

Reuti reuti at staff.uni-marburg.de
Tue Jul 8 20:42:47 BST 2008


Hi,

Am 08.07.2008 um 20:48 schrieb Craig Tierney:

> I was looking at the output of qstat -g c today for the queues I  
> have on my system, and I am
> wondering if the output is correct.
>
> I have two queues with associated hosts:
>
> Queue        Hostlist
> ---------------------
> qwcomp.q     @wcomp
> qwbigmem.q   wserial0 @wcomp
>
> So qwbigmem is just the same hostlist as qwcomp.q with the  
> exception of wserial0
> (which happens to have extra memory).
>
> Every node has 4 cores.  To ensure that every host only ever has 4  
> slots in use,
> I set the complex_value for the execution host with slots=4.
>
> The hostgroup @wcomp has 341 hosts (1364 slots).  Since qwbigmem.q has
> one extra node, it has 1368 slots.
>
> When I submit a job to qwcomp.q, the output from qstat -g c shows
> the slots in use and the slots available.  If I am using 200 slots
> in the qwcomp.q queue, qstat -g c will show:
>
>
> CLUSTER QUEUE                   CQLOAD   USED  AVAIL  TOTAL aoACDS   
> cdsuE
> ---------------------------------------------------------------------- 
> ----
> qwbigmem0.q                       0.15      0   1368   1368       
> 0      0
> qwcomp.q                          0.15    200   1164   1364       
> 0      0
>
> My question is, why is qwbigmem0.q saying there are 1368 slots  
> available?
> There aren't because the consumable resource slots has been consumed
> by the jobs in the other queues.  Why doesn't qstat -g c report  
> that only
> 1168 slots are available in qwbigmem0.q?

it's simply not so sophisticated to honor more than one queue per  
node. Maybe the heading would be better "UNUSED" or "IDLING" (which  
would mean a slot in the cluster queue, not a complete host). The  
output of available slots being 1368 + 1364 is always wrong if you  
see it this way, hence the output would have to be completely  
different to be more correct in a mathematical sense.

If you would really take all data into account, the generated load by  
qwbigmem0.q should also read zero (but it would be advanced to track  
down the real consumption per queue to show the "real" value, e.g. if  
you start to oversubscribe a node). Not to mention any imposed limits  
by RQSs. This could even lead to different outputs for different users.

Nevertheless, feel free to enter an RFE for a "personalized", let's  
call it: qavail [ -u USER ] [ -l ... ] [ -qtype batch | interactive |  
parallel ] [ -pe ... ] ...

which would honor all limits and restrictions.

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list