Opened 10 years ago

Last modified 9 years ago

#765 new defect

IZ3218: showq client shows wrong amount of hosts

Reported by: brettlee Owned by:
Priority: normal Milestone:
Component: sge Version: 6.2u4
Severity: Keywords: clients
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3218]

        Issue #:      3218             Platform:     All      Reporter: brettlee (brettlee)
       Component:     gridengine          OS:        All
     Subcomponent:    clients          Version:      6.2u4       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    roland (roland)
      QA Contact:     roland
          URL:
       * Summary:     showq client shows wrong amount of hosts
   Status whiteboard:
      Attachments:

     Issue 3218 blocks:
   Votes for issue 3218:


   Opened: Wed Dec 30 14:29:00 -0700 2009 
------------------------


The showq client shows the wrong amount of hosts because the value is hardcoded for the special case of the code contributor.

This is from the post of dev mailinglist from:

-bash-3.2# /export/share/sge/examples/jobsbin/sol-amd64/showq
ACTIVE JOBS--------------------------
JOBID     JOBNAME    USERNAME      STATE   CORE  REMAINING  STARTTIME
================================================================================

     0 active jobs :    0 of 3936 hosts (  0.00 %)

WAITING JOBS------------------------
JOBID     JOBNAME    USERNAME      STATE   CORE  WCLIMIT    QUEUETIME
================================================================================

WAITING JOBS WITH JOB DEPENDENCIES---
JOBID     JOBNAME    USERNAME      STATE   CORE  WCLIMIT    QUEUETIME
================================================================================

UNSCHEDULED JOBS---------------------
JOBID     JOBNAME    USERNAME      STATE   CORE  WCLIMIT    QUEUETIME
================================================================================

Total jobs: 0     Active Jobs: 0     Waiting Jobs: 0     Dep/Unsched
Jobs: 0

Notice that it says I'm using 0 of 3936 hosts.  There are 2 hosts in my cluster.  I'm gonna say there's still some stuff that's hard-coded
for TACC in there.
*** (#1 of 1): 2009-11-11 09:19:29 MST d.gruber@sun.com

In showq_show_job_tacc() function, there is hard coded that number:

total_slot_count = 82 * 4 * 12 * 16;

printf("%6d active jobs : %4d of %4d hosts (%6.2f %%)\n", active_job_count, (int) ceil(active_slot_count / 16.0), (int)
ceil(total_slot_count / 16.0), 100 * active_slot_count / (float) total_slot_count);
*** (#1 of 1): 2009-11-16 12:08:21 MST michael.pospisil@sun.com

The current code reports the number of jobs divided by the total number of slots, and as expected the total slots value is calculated in a
way that is specific to TACC.  To make this functionality work as expected for all sites, the task at hand would be to determine which hosts
were currently in the queue as well as how many slots each of those hosts had, sum up the total slots and then use that sum for the divisor.
 As this metric is currently not available, except for removing the erroneous output outside of TACC this would seem to be an RFE.

This utilization metric seems like a significant data point capture and report over time.  Seeking guidance from the community on whether
this functionality should be implemented or not, and if so, how:  1) every time showq is run, 2) only when showq is run with an option
requesting the metric, or 3) some other combination or approach.

Note: the "WAITING JOBS-----" output seems like it could use one more "-" to line up with the others.  See:
ACTIVE JOBS--------------------------
WAITING JOBS------------------------
WAITING JOBS WITH JOB DEPENDENCIES---
UNSCHEDULED JOBS---------------------

Change History (0)

Note: See TracTickets for help on using tickets.