Opened 13 years ago

Last modified 11 years ago

#1215 new defect

IZ274: Hedeby test for issue 554 fails sporadically

Reported by: torsten Owned by:
Priority: high Milestone:
Component: testsuite Version: current
Severity: Keywords: hedeby


[Imported from gridengine issuezilla]

        Issue #:      274             Platform:     All           Reporter: torsten (torsten)
       Component:     testsuite          OS:        All
     Subcomponent:    hedeby          Version:      current          CC:    None defined
        Status:       NEW             Priority:     P2
      Resolution:                    Issue type:    DEFECT
                                  Target milestone: milestone 1
      Assigned to:    crei (crei)
      QA Contact:     crei
       * Summary:     Hedeby test for issue 554 fails sporadically
   Status whiteboard:

     Issue 274 blocks:
   Votes for issue 274:     Vote for this issue

   Opened: Mon Jan 12 07:39:00 -0700 2009 

The test for issue 554 (issues/issue_554/check.62.exp) performs the following steps:

 o Step 1: Gathering the resources and services information
 o Step 2: Shutdown of all services in the system with -fr flag
 o Step 3: Check that all resources went to resource provider
 o Step 4: Startup the stopped spare_pool
 o Step 5: Check that the spare pool got all resources from resource provider
 o Step 6: Startup the stopped ge_adapters
 o Step 7: Cleanup

Set sds -fr on a GEAdapter triggers the uninstall of all execds. If the all
uninstall scripts are finished service goes into UNKNOWN state. However the
RESOURCE_REMOVED event is sent after receiving the EXECD_DEL event from qmaster.
It can happen that the jgdi is not received before GE service goes down.

This can be a problem for this test which leads sometimes to its failure. It is
not guaranteed that all resources are really freed. This failure is NOT related
to issue 554.

The checks in Step 3 and Step 5 are not guaranteed.

It can only be checked that the resources which went with sdmsdm sds -fr to
ResourceProvider go after Step 4 to SparePool.

Furthermore, one important step is missing:

 o Step 6a: Check that no resource has been lost

The resource which has not been released from GE service must stay at GE
service. All other resources must be in spare_pool.

BTW: The description in the adoc header of issue 554 test is not correct.

Change History (0)

Note: See TracTickets for help on using tickets.