Opened 13 years ago

Closed 7 years ago

#369 closed defect (fixed)

IZ2075: help output for qstat -explain should mention only usable for queue instance

Reported by: ovid Owned by: Dave Love <d.love@…>
Priority: lowest Milestone:
Component: sge Version: 6.0u4
Severity: Keywords: Sun clients
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2075]

        Issue #:      2075             Platform:     Sun      Reporter: ovid (ovid)
       Component:     gridengine          OS:        All
     Subcomponent:    clients          Version:      6.0u4       CC:    None defined
        Status:       NEW              Priority:     P5
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    roland (roland)
      QA Contact:     roland
          URL:
       * Summary:     help output for qstat -explain should mention only usable for queue instance
   Status whiteboard:
      Attachments:

     Issue 2075 blocks:
   Votes for issue 2075:


   Opened: Thu Jun 15 15:50:00 -0700 2006 
------------------------


qstat -explain E broken

Here's the plain output:


sgetest@dt218-130# qstat -explain E
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q@dt218-130                BIPC  1/20      0.05     lx24-x86
   1378 0.50617 PENDING    sgetest      r     06/15/2004 15:39:03     1 2
----------------------------------------------------------------------------
all.q@dt218-141                BIPC  2/40      0.02     sol-x86
   1374 0.55500 ARRAY      sgetest      r     06/15/2004 15:39:03     1 2
   1378 0.50617 PENDING    sgetest      r     06/15/2004 15:39:03     1 1
----------------------------------------------------------------------------
all.q@dt218-155                BIPC  0/20      0.36     sol-amd64
----------------------------------------------------------------------------
all.q@dt218-170                BIPC  2/40      0.00     sol-amd64
   1374 0.55500 ARRAY      sgetest      r     06/15/2004 15:39:03     1 3
   1378 0.50617 PENDING    sgetest      r     06/15/2004 15:39:03     1 3
----------------------------------------------------------------------------
all.q@dt218-32                 BIPC  3/4       0.01     sol-sparc
   1373 0.55500 SEQUENTIAL sgetest      r     06/15/2004 15:39:03     1
   1374 0.55500 ARRAY      sgetest      r     06/15/2004 15:39:03     1 4
   1378 0.50617 PENDING    sgetest      r     06/15/2004 15:39:03     1 4
----------------------------------------------------------------------------
all.q@dt218-65                 BIPC  1/40      0.02     sol-sparc64
   1374 0.55500 ARRAY      sgetest      r     06/15/2004 15:39:03     1 1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
   1376 0.55500 ERROR      sgetest      Eqw   06/15/2004 15:38:56     1
   1377 0.00000 HOLD       sgetest      hqw   06/15/2004 15:38:56     1


Pretty much like qstat -f.

The XML output is no better:

sgetest@dt218-130# qstat -explain E -xml
<?xml version='1.0'?>
<job_info  xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <queue_info>
    <Queue-List>
      <name>all.q@dt218-130</name>
      <qtype>BIPC</qtype>
      <slots_used>1</slots_used>
      <slots_total>20</slots_total>
      <load_avg>0.04000</load_avg>
      <arch>lx24-x86</arch>
      <job_list state="running">
        <JB_job_number>1378</JB_job_number>
        <JAT_prio>0.50617</JAT_prio>
        <JB_name>PENDING</JB_name>
        <JB_owner>sgetest</JB_owner>
        <state>r</state>
        <JAT_start_time>06/15/2004 15:39:03</JAT_start_time>
        <slots>1</slots>
        <tasks>2</tasks>
      </job_list>
    </Queue-List>
....

<job_info>
    <job_list state="pending">
      <JB_job_number>1376</JB_job_number>
      <JAT_prio>0.55500</JAT_prio>
      <JB_name>ERROR</JB_name>
      <JB_owner>sgetest</JB_owner>
      <state>Eqw</state>
      <JB_submission_time>06/15/2004 15:38:56</JB_submission_time>
      <slots>1</slots>
    </job_list>
    <job_list state="pending">
      <JB_job_number>1377</JB_job_number>
      <JAT_prio>0.00000</JAT_prio>
      <JB_name>HOLD</JB_name>
      <JB_owner>sgetest</JB_owner>
      <state>hqw</state>
      <JB_submission_time>06/15/2004 15:38:56</JB_submission_time>
      <slots>1</slots>
    </job_list>
  </job_info>
</job_info>


But the help flag says

sgetest@dt218-130#  qstat -help
SGE 6.0u4
usage: qstat [options]
        [-ext]                            view additional attributes
        [-explain a|c|A|E]                show reason for c(onfiguration
amiguous), a(larm), suspend A(larm), E(rror) state

   ------- Additional comments from roland Thu Nov 16 01:04:17 -0700 2006 -------
The comments section is incorrect because it mixes job error with queue error.
The "qstat -qs E" switch is a queue filter and shows all queues in error state
AND all pending jobs. The "qstat -f -qs E" should print the same output as
"qstat -qs E". This is not the case, it prints nothing.

My suspicion is qstat -f recognize no queues are selected and then print's no
output. This is wrong because it has to print the pending jobs.

   ------- Additional comments from roland Thu Nov 16 01:07:37 -0700 2006 -------
The last comment is wrong (belongs to issue 2073)

   ------- Additional comments from roland Thu Nov 16 01:08:51 -0700 2006 -------
qstat is not broken. The '-explain' switch shows only the state of a queue
instance. It's correctly documented in the man page:
-explain a|A|c|E
          'c' displays the reason for the configuration ambiguous
          state of a queue instance. 'a' shows the reason for the
          alarm  state.  Suspend  alarm  state  reasons  will  be
          displayed  by  'A'. 'E' displays the reason for a queue
          instance error state.

In the mentioned example a job is in error state. The reason for this error can
be found in the 'qstat -j <jobid' output.

The -help output should be improved.

Change History (1)

comment:1 Changed 7 years ago by Dave Love <d.love@…>

  • Owner set to Dave Love <d.love@…>
  • Resolution set to fixed
  • Status changed from new to closed

In 4470/sge:

Message fixes
Fixes #369

Note: See TracTickets for help on using tickets.