Opened 13 years ago

Last modified 11 years ago

#551 new defect

IZ2667: need blackhole detection, e.g. EXIT_RATE

Reported by: templedf Owned by:
Priority: normal Milestone:
Component: sge Version: 6.2
Severity: Keywords: execution


[Imported from gridengine issuezilla]

        Issue #:      2667             Platform:     All      Reporter: templedf (templedf)
       Component:     gridengine          OS:        All
     Subcomponent:    execution        Version:      6.2         CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    pollinger (pollinger)
      QA Contact:     pollinger
       * Summary:     need blackhole detection, e.g. EXIT_RATE
   Status whiteboard:

     Issue 2667 blocks:
   Votes for issue 2667:

   Opened: Mon Jul 21 12:16:00 -0700 2008 

Grid Engine should provide a means to recognize and automatically disable hosts
that are consistently either placing jobs into error state or causing them to
exit with an error condition.  In other DRM systems, administrators can set an
EXIT_RATE limit.  Any host that processes more jobs per unit of time than the
limit is marked as rogue and disabled.

   ------- Additional comments from rayson Mon Jul 21 12:29:42 -0700 2008 -------
Having an integrated solution would be nice :)

And just for the record, there were some discussions on this topic on the list
-- as early as 2003:

Do a subject search ("Black Hole Workstations") instead of browsing the mail
thread to view the whole discussion.

Change History (0)

Note: See TracTickets for help on using tickets.