Opened 11 years ago

Last modified 9 years ago

#911 new defect

IZ615: GE adapter internal error handled in different for START and RELOAD operation

Reported by: marcingoldyn Owned by:
Priority: normal Milestone:
Component: hedeby Version: current
Severity: Keywords: Sun gridengine_adapter
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=615]

        Issue #:      615                      Platform:     Sun         Reporter: marcingoldyn (marcingoldyn)
       Component:     hedeby                      OS:        All
     Subcomponent:    gridengine_adapter       Version:      current        CC:    None defined
        Status:       NEW                      Priority:     P3
      Resolution:                             Issue type:    DEFECT
                                           Target milestone: 1.0u5next
      Assigned to:    rhierlmeier (rhierlmeier)
      QA Contact:     rhierlmeier
          URL:
       * Summary:     GE adapter internal error handled in different for START and RELOAD operation
   Status whiteboard:
      Attachments:


     Issue 615 blocks:
   Votes for issue 615:     Vote for this issue


   Opened: Fri Jan 23 07:45:00 -0700 2009 
------------------------


   DESCRIPTION:

   It can happen that the same error caught during start up of the component and
   during its reload could lead to different comonent/service states.

   We can do following example:

   1. while ge adapter is running we can add one fake resource to its spool dir
   f.e. resourceFake.srf
   2. then we will perform update/reload of the ge adapter component
   3. Ge adpater component/service state should be STOPPED/UKNOWN

   error in logs:
   Component xxxx: Error in reload procedure: Service xxxx: Unexpected error in
   state transition RunningStateHandler[RUNNING] -> ReloadingStateHandler[UNKNOWN]:
   Reload of service failed: Cannot load resource from spool:
   /usr/local/testsuite/xxxx/hedeby_yyy/spool/xxxx/resourceFake.srf

   but when we do following:
   1. stop ge adapter component
   2. we will add one fake resource to its spool dir f.e. resourceFake.srf
   3. startup ge adapter component
   4. Ge adpater component/service state should be STOPPED/ERROR

   error in logs:
   Componentxxxx: Error in startup procedure: Service xxxx: Unexpected error in
   state transition UnknownStateHandler[UNKNOWN] -> StartingStateHandler[STARTING]:
   Service startup failed: Cannot load resource from spool:
   /usr/local/testsuite/xxxx/hedeby_yyy/spool/xxxx/resourceFake.srf

   error that occured was internal ge adapter error, caused by not valid srf file
   in ge adapter spool. In that case *real* Grid Engine is not touched. It means
   that service state should be UNKNOWN not ERROR. ERROR should only appear in case
   that *real* GE will report it.

   EVALUATION:

   This issue is not critical. Its just more about consistancy of component/service
   states. Errors in logs will guide the end user to the source of the problem.
   From the other hand we should care about consistancy of our service/components
   states and check that they are like it is described in documentation.

   WORKAROUND/SUGGESTED FIX:

   There is no workaround. To fix the problem error handling in
   StartingStateHandler has to be change. Service has to be put in UNKNOWN state.

   ANALYSIS:

   in AbstractServiceAdapter class, in private StartingStateHandler class when
   GrmException line 789 and later line 792 Runtime Exception is thrown we are
   setting service state to ERROR. We need to somehow differenciate between
   internal ge adapter errors and *real* GE errors if we want to set the proper
   service state.

   HOW TO TEST:

   testsuite test should perform 2 above scenarios and ge adapter
   components/service state should be same for both of them STOPPED/UNKNOWN.

   ETC 3 PD
               ------- Additional comments from rhierlmeier Wed Nov 25 07:21:11 -0700 2009 -------
   Milestone changed
               ------- Additional comments from torsten Fri Nov 27 05:09:40 -0700 2009 -------
   This issue might be fixed with the changes for issue 595. Needs to be tested again.

Change History (0)

Note: See TracTickets for help on using tickets.