Opened 18 years ago
Closed 8 years ago
#85 closed enhancement (fixed)
IZ483: reporting of reason for job abort
Reported by: | fy | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | sge | Version: | 5.3p2 |
Severity: | minor | Keywords: | execution |
Cc: |
Description
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=483]
Issue #: 483 Platform: All Reporter: fy (fy) Component: gridengine OS: All Subcomponent: execution Version: 5.3p2 CC: None defined Status: NEW Priority: P3 Resolution: Issue type: ENHANCEMENT Target milestone: --- Assigned to: andreas (andreas) QA Contact: pollinger URL: * Summary: reporting of reason for job abort Status whiteboard: Attachments: Issue 483 blocks: Votes for issue 483: Opened: Wed Feb 5 08:11:00 -0700 2003 ------------------------ When a job exceeds one of the resource limits, say h_rt, the job is killed with SIGKILL, however, there is no clear indication as to why the job was killed. None of the files in $SGE_JOB_SPOOL_DIR provide any added information, so writing an epilog does not help. The best we can do is guess from the failed code (=100), and end_time-start_time. The enhancment being asked for is some kind of reason as to why the job was killed, e.g. (reason=limit_exceeded, limit=h_rt). ------- Additional comments from andreas Fri Apr 15 06:56:13 -0700 2005 ------- HOWTOFIX: In case a job was terminated due to limit exeeded a new SSTATE_* such as SSTATE_LIMIT_EXCEEDED is needed. Each time sge_execd initiates job termination a flag must be set with that job. This will sge_execd to report SSTATE_LIMIT_EXCEEDED for those jobs/tasks. ------- Additional comments from andreas Mon Jul 25 02:15:10 -0700 2005 ------- Changed execd logging in Maintrunk from Info to Warning to improve diagnosics. Further improvements seem possible. Added related comments to daemons/execd/execd_signal_queue.c ------- Additional comments from andreas Mon Aug 8 04:16:35 -0700 2005 ------- See issue 1743 for a reasonable 6.0 based workaround and an RFE how to generally improve job diagnostics based oo job life-cycle information available in reporting(5). ------- Additional comments from sgrell Mon Dec 12 03:04:05 -0700 2005 ------- Changed the subcomponent. Stephan
Change History (1)
comment:1 Changed 8 years ago by dlove
- Resolution set to fixed
- Severity set to minor
- Status changed from new to closed
Note: See
TracTickets for help on using
tickets.
AH-2005-07-25-0