Opened 14 years ago

Last modified 9 years ago

#267 new enhancement

IZ1743: Job life-cycle information in reporting(5) file should be easily accessable via CLI

Reported by: andreas Owned by:
Priority: normal Milestone:
Component: sge Version: 6.0
Severity: Keywords: clients
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1743]

        Issue #:      1743             Platform:     All           Reporter: andreas (andreas)
       Component:     gridengine          OS:        All
     Subcomponent:    clients          Version:      6.0              CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    ENHANCEMENT
                                   Target milestone: ---
      Assigned to:    roland (roland)
      QA Contact:     roland
          URL:
       * Summary:     Job life-cycle information in reporting(5) file should be easily accessable via CLI
   Status whiteboard:
      Attachments:

     Issue 1743 blocks:
   Votes for issue 1743:


   Opened: Mon Aug 8 02:59:00 -0700 2005 
------------------------


DESCRIPTION:
The job log information in reporting(5) should be easily available via CLI.
In reporting(5) file traces about job related events are contained that can
be critical to understand occurrencies within a N1GE cluster.

E.g. for a job that was deleted via qdel command accounting(5) based qacct command
reports merely vague the job died without knowing the reason:

> qacct -j 83126
==============================================================
qname        all.q
hostname     pippin
group        staff
owner        ah114088
project      NONE
department   dept1
jobname      Sleeper
jobnumber    83126
taskid       undefined
account      sge
priority     0
qsub_time    Mon Aug  8 11:42:05 2005
start_time   Mon Aug  8 11:46:09 2005
end_time     Mon Aug  8 11:46:20 2005
granted_pe   NONE
slots        1
failed       100 : assumedly after job
exit_status  137
ru_wallclock 11
ru_utime     0

even in cases when the reporting(5) file knows much more about the life cycle of
the very same job:

> grep ":83126:" $SGE_ROOT/default/common/reporting
1123494125:new_job:1123494125:83126:-1:NONE:Sleeper:ah114088:staff::dept1:sge:-2753074035712
1123494126:job_log:1123494126:sent:83126:0:NONE:t:master:dain:-2753074036736:-2753074035712:5418461421:Sleeper:ah114088:staff::dept1:sge:sent
to execd
1123494126:job_log:1123494126:delivered:83126:0:NONE:r:master:dain:-2753074036736:-2753074035712:1123494125:Sleeper:ah114088:staff::dept1:sge:job
received by execd
1123494138:job_log:1123494138:deleted:83126:0:NONE:r:ah114088:pippin.germany.sun.com:-2753074036736:-2753074035712:1123494125:Sleeper:ah114088:staff::dept1:sge:job
deleted
1123494139:acct:all.q:pippin:staff:ah114088:Sleeper:83126:sge:0:1123494125:1123494369:1123494380:100:137:11:0:0:0.000000:0:0:0:0:0:0:0:0.000000:61:396:0:2:470:0:NONE:dept1:NONE:1:0:0.000000:0.000000:0.000000:-U
dept1,dept2 -q all.q@pippin:0.000000:NONE:0.000000
1123494139:job_log:1123494139:deleted:83126:0:NONE:r:execution
daemon:pippin:0:-2753074035712:1123494125:Sleeper:ah114088:staff::dept1:sge:job
removed
1123494139:job_log:1123494139:finished:83126:0:NONE:r:master:dain:-2753074036736:-2753074035712:1123494125:Sleeper:ah114088:staff::dept1:sge:job
waits for schedds deletion
1123494140:job_log:1123494140:deleted:83126:0:NONE:T:scheduler:dain:0:1024:1123494125:Sleeper:ah114088:staff::dept1:sge:job
deleted by schedd

HOWTOFIX:
A solution could be to enhance qacct -j <jobid> and qstat -j <jobid> in a way
that job life cycle information becomes easily available.

WORKAROUND:
A workaround could be a qacct/qstat script wrappers that access reporting(5)
file when called with -j <jobid> option.

   ------- Additional comments from andreas Mon Aug 8 04:16:35 -0700 2005 -------
There is a relation to issue 483

Change History (0)

Note: See TracTickets for help on using tickets.