Opened 12 years ago
Last modified 10 years ago
#911 new defect
IZ615: GE adapter internal error handled in different for START and RELOAD operation
Reported by: | marcingoldyn | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | hedeby | Version: | current |
Severity: | Keywords: | Sun gridengine_adapter | |
Cc: |
Description
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=615]
Issue #: 615 Platform: Sun Reporter: marcingoldyn (marcingoldyn) Component: hedeby OS: All Subcomponent: gridengine_adapter Version: current CC: None defined Status: NEW Priority: P3 Resolution: Issue type: DEFECT Target milestone: 1.0u5next Assigned to: rhierlmeier (rhierlmeier) QA Contact: rhierlmeier URL: * Summary: GE adapter internal error handled in different for START and RELOAD operation Status whiteboard: Attachments: Issue 615 blocks: Votes for issue 615: Vote for this issue Opened: Fri Jan 23 07:45:00 -0700 2009 ------------------------ DESCRIPTION: It can happen that the same error caught during start up of the component and during its reload could lead to different comonent/service states. We can do following example: 1. while ge adapter is running we can add one fake resource to its spool dir f.e. resourceFake.srf 2. then we will perform update/reload of the ge adapter component 3. Ge adpater component/service state should be STOPPED/UKNOWN error in logs: Component xxxx: Error in reload procedure: Service xxxx: Unexpected error in state transition RunningStateHandler[RUNNING] -> ReloadingStateHandler[UNKNOWN]: Reload of service failed: Cannot load resource from spool: /usr/local/testsuite/xxxx/hedeby_yyy/spool/xxxx/resourceFake.srf but when we do following: 1. stop ge adapter component 2. we will add one fake resource to its spool dir f.e. resourceFake.srf 3. startup ge adapter component 4. Ge adpater component/service state should be STOPPED/ERROR error in logs: Componentxxxx: Error in startup procedure: Service xxxx: Unexpected error in state transition UnknownStateHandler[UNKNOWN] -> StartingStateHandler[STARTING]: Service startup failed: Cannot load resource from spool: /usr/local/testsuite/xxxx/hedeby_yyy/spool/xxxx/resourceFake.srf error that occured was internal ge adapter error, caused by not valid srf file in ge adapter spool. In that case *real* Grid Engine is not touched. It means that service state should be UNKNOWN not ERROR. ERROR should only appear in case that *real* GE will report it. EVALUATION: This issue is not critical. Its just more about consistancy of component/service states. Errors in logs will guide the end user to the source of the problem. From the other hand we should care about consistancy of our service/components states and check that they are like it is described in documentation. WORKAROUND/SUGGESTED FIX: There is no workaround. To fix the problem error handling in StartingStateHandler has to be change. Service has to be put in UNKNOWN state. ANALYSIS: in AbstractServiceAdapter class, in private StartingStateHandler class when GrmException line 789 and later line 792 Runtime Exception is thrown we are setting service state to ERROR. We need to somehow differenciate between internal ge adapter errors and *real* GE errors if we want to set the proper service state. HOW TO TEST: testsuite test should perform 2 above scenarios and ge adapter components/service state should be same for both of them STOPPED/UNKNOWN. ETC 3 PD ------- Additional comments from rhierlmeier Wed Nov 25 07:21:11 -0700 2009 ------- Milestone changed ------- Additional comments from torsten Fri Nov 27 05:09:40 -0700 2009 ------- This issue might be fixed with the changes for issue 595. Needs to be tested again.
Note: See
TracTickets for help on using
tickets.