Opened 18 years ago

Closed 8 years ago

#85 closed enhancement (fixed)

IZ483: reporting of reason for job abort

Reported by: fy Owned by:
Priority: normal Milestone:
Component: sge Version: 5.3p2
Severity: minor Keywords: execution


[Imported from gridengine issuezilla]

        Issue #:      483              Platform:     All           Reporter: fy (fy)
       Component:     gridengine          OS:        All
     Subcomponent:    execution        Version:      5.3p2            CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    ENHANCEMENT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     pollinger
       * Summary:     reporting of reason for job abort
   Status whiteboard:

     Issue 483 blocks:
   Votes for issue 483:

   Opened: Wed Feb 5 08:11:00 -0700 2003 

When a job exceeds one of the resource limits, say
h_rt, the job is killed with SIGKILL, however,
there is no clear indication as to why the job was
killed. None of the files in $SGE_JOB_SPOOL_DIR
provide any added information, so writing an
epilog does not help. The best we can do is guess
from the failed code (=100), and

The enhancment being asked for is some kind of
reason as to why the job was killed, e.g.
(reason=limit_exceeded, limit=h_rt).

   ------- Additional comments from andreas Fri Apr 15 06:56:13 -0700 2005 -------
In case a job was terminated due to limit exeeded a new SSTATE_* such as
SSTATE_LIMIT_EXCEEDED is needed. Each time sge_execd initiates job termination
a flag must be set with that job. This will sge_execd to report
SSTATE_LIMIT_EXCEEDED for those jobs/tasks.

   ------- Additional comments from andreas Mon Jul 25 02:15:10 -0700 2005 -------
Changed execd logging in Maintrunk from Info to Warning to improve diagnosics.
Further improvements seem possible. Added related comments to

   ------- Additional comments from andreas Mon Aug 8 04:16:35 -0700 2005 -------
See issue 1743 for a reasonable 6.0 based workaround and an RFE how to generally
improve job diagnostics based oo job life-cycle information available in

   ------- Additional comments from sgrell Mon Dec 12 03:04:05 -0700 2005 -------
Changed the subcomponent.


Change History (1)

comment:1 Changed 8 years ago by dlove

  • Resolution set to fixed
  • Severity set to minor
  • Status changed from new to closed


Note: See TracTickets for help on using tickets.