[GE issues] [Issue 3218] New - showq client shows wrong amount of hosts

brettlee brett.lee at sun.com
Wed Dec 30 21:29:35 GMT 2009


http://gridengine.sunsource.net/issues/show_bug.cgi?id=3218
                 Issue #|3218
                 Summary|showq client shows wrong amount of hosts
               Component|gridengine
                 Version|6.2u4
                Platform|All
                     URL|
              OS/Version|All
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P3
            Subcomponent|clients
             Assigned to|roland
             Reported by|brettlee






------- Additional comments from brettlee at sunsource.net Wed Dec 30 13:29:34 -0800 2009 -------
The showq client shows the wrong amount of hosts because the value is hardcoded for the special case of the code contributor. 

This is from the post of dev mailinglist from: 

-bash-3.2# /export/share/sge/examples/jobsbin/sol-amd64/showq
ACTIVE JOBS--------------------------
JOBID     JOBNAME    USERNAME      STATE   CORE  REMAINING  STARTTIME
================================================================================

     0 active jobs :    0 of 3936 hosts (  0.00 %)

WAITING JOBS------------------------
JOBID     JOBNAME    USERNAME      STATE   CORE  WCLIMIT    QUEUETIME
================================================================================

WAITING JOBS WITH JOB DEPENDENCIES---
JOBID     JOBNAME    USERNAME      STATE   CORE  WCLIMIT    QUEUETIME
================================================================================

UNSCHEDULED JOBS---------------------
JOBID     JOBNAME    USERNAME      STATE   CORE  WCLIMIT    QUEUETIME
================================================================================

Total jobs: 0     Active Jobs: 0     Waiting Jobs: 0     Dep/Unsched 
Jobs: 0

Notice that it says I'm using 0 of 3936 hosts.  There are 2 hosts in my cluster.  I'm gonna say there's still some stuff that's hard-coded
for TACC in there.
*** (#1 of 1): 2009-11-11 09:19:29 MST d.gruber at sun.com

In showq_show_job_tacc() function, there is hard coded that number:

total_slot_count = 82 * 4 * 12 * 16;

printf("%6d active jobs : %4d of %4d hosts (%6.2f %%)\n", active_job_count, (int) ceil(active_slot_count / 16.0), (int)
ceil(total_slot_count / 16.0), 100 * active_slot_count / (float) total_slot_count);
*** (#1 of 1): 2009-11-16 12:08:21 MST michael.pospisil at sun.com

The current code reports the number of jobs divided by the total number of slots, and as expected the total slots value is calculated in a
way that is specific to TACC.  To make this functionality work as expected for all sites, the task at hand would be to determine which hosts
were currently in the queue as well as how many slots each of those hosts had, sum up the total slots and then use that sum for the divisor.
 As this metric is currently not available, except for removing the erroneous output outside of TACC this would seem to be an RFE.

This utilization metric seems like a significant data point capture and report over time.  Seeking guidance from the community on whether
this functionality should be implemented or not, and if so, how:  1) every time showq is run, 2) only when showq is run with an option
requesting the metric, or 3) some other combination or approach.

Note: the "WAITING JOBS-----" output seems like it could use one more "-" to line up with the others.  See:
ACTIVE JOBS--------------------------
WAITING JOBS------------------------
WAITING JOBS WITH JOB DEPENDENCIES---
UNSCHEDULED JOBS---------------------

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=235669

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list