Opened 12 years ago

Last modified 9 years ago

#860 new enhancement

IZ74: intelligent crash recovery feature

Reported by: easymf Owned by:
Priority: normal Milestone:
Component: hedeby Version: current
Severity: Keywords: bootstrap
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=74]

        Issue #:      74              Platform:     All           Reporter: easymf (easymf)
       Component:     hedeby             OS:        All
     Subcomponent:    bootstrap       Version:      current          CC:    None defined
        Status:       NEW             Priority:     P3
      Resolution:                    Issue type:    ENHANCEMENT
                                  Target milestone: 1.0u5next
      Assigned to:    easymf (easymf)
      QA Contact:     adoerr
          URL:
       * Summary:     intelligent crash recovery feature
   Status whiteboard:
      Attachments:


     Issue 74 blocks:   [DEL: 663 :DEL] 663
   Votes for issue 74:                     Vote for this issue


   Opened: Fri Aug 3 01:50:00 -0700 2007 
------------------------


   Hedeby (CS) can be brought to inconsistent state by following steps:

   1. CS is stopped
   2. Any JVM is stopped locally
   3. CS is started

   The action actually is similar to "crash" of JVM or "crash" of CS and subsequent
   restarting of CS.

   The above steps result in "outdated" information in CS active component subcontext.

   Proposed fix - to implement heartbeat service in JVMs (and in components). Each
   JVM (component) will in pre-defined time intervals notify the CS that it is
   alive (the active component will be potentially rebound). Furthermore, the CS
   should implement modified lease ticket service - if JVM (component) fails to
   update its active component record in pre-defined interval, the JVM (component)
   will be marked as inactive (and the active component recored should be deleted).

   With above approach, it will be possible to delete all active component records
   by shutdown (or better startup) of CS - JVMs (components) will re-register
   itself automatically once the CS is up again.
               ------- Additional comments from adoerr Tue May 13 23:48:40 -0700 2008 -------
   Changed issue type to "ENHANCEMENT".

               ------- Additional comments from rhierlmeier Wed Nov 25 07:21:11 -0700 2009 -------
   Milestone changed

Change History (0)

Note: See TracTickets for help on using tickets.