Opened 14 years ago

Last modified 9 years ago

#251 new defect

IZ1645: qsub -sync y wait forever if evenclient was killed

Reported by: roland Owned by:
Priority: low Milestone:
Component: sge Version: 6.0u4
Severity: Keywords: Sun clients
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1645]

        Issue #:      1645             Platform:     Sun      Reporter: roland (roland)
       Component:     gridengine          OS:        All
     Subcomponent:    clients          Version:      6.0u4       CC:    None defined
        Status:       NEW              Priority:     P4
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     roland
          URL:
       * Summary:     qsub -sync y wait forever if evenclient was killed
   Status whiteboard:
      Attachments:

     Issue 1645 blocks:
   Votes for issue 1645:


   Opened: Tue May 31 02:25:00 -0700 2005 
------------------------


submitting a job with qsub -sync y can result in an endless wait when the event
client will be killed with "qconf -kec 11".

The trace below shows the qsub gets the kill event and removes the event client
properly. The submitted job will not be killed and continues to execute. After
the job finish the qsub client does never exits because it does not recognize
the end of the job.

...
   256   6801 5     try to get request from qmaster, id 1
   257   6801 5     Sent ack for all events lower or equal 3
   258   6801 5     ec_get - received 1 events
   259   6801 5         Event: 3. EVENT SHUTDOWN intkey 0 intkey2 0
   260   6801 5     Received shutdown message
   261   6801 5     unregistering from qmaster ...
   262   6801 5     ... unregistered.
^C
Interrupted!
...

   ------- Additional comments from templedf Thu Jul 21 02:42:36 -0700 2005 -------
I now understand what's wrong here.  The problem is that japi_exit() will shut
down the event client thread, but not the other way around.  If the event client
thread is killed, any blocked japi wait calls will remain forever blocked.  The
solution would be to have an event client shutdown trigger a call to japi_exit().

   ------- Additional comments from sgrell Tue Dec 6 08:11:10 -0700 2005 -------
Changed the Subcomponent.

Stephan

   ------- Additional comments from roland Wed Dec 7 02:22:46 -0700 2005 -------
lowering priority because admin must kill the event client explicit to ran into
this issue

Change History (0)

Note: See TracTickets for help on using tickets.