Opened 12 years ago

Last modified 9 years ago

#489 new defect

IZ2493: Qmaster restart takes long time after short duration maintainance shutdown

Reported by: andreas Owned by:
Priority: low Milestone:
Component: sge Version: 6.1AR_snapshot3_2
Severity: Keywords: qmaster
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2493]

        Issue #:      2493             Platform:     All                 Reporter: andreas (andreas)
       Component:     gridengine          OS:        All
     Subcomponent:    qmaster          Version:      6.1AR_snapshot3_2      CC:
                                                                                   [_] bbarth
                                                                                   [_] Remove selected CCs
        Status:       NEW              Priority:     P4
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    ernst (ernst)
      QA Contact:     ernst
          URL:
       * Summary:     Qmaster restart takes long time after short duration maintainance shutdown
   Status whiteboard:
      Attachments:

     Issue 2493 blocks:
   Votes for issue 2493:


   Opened: Fri Feb 15 09:32:00 -0700 2008 
------------------------


DESCRIPTION:
If qmaster is shut down for a short duration and restarted it can happen that
many execd load reports queue up in qmaster until it has passed the startup
phase. E.g. in a cluster with a large number of nodes the amount of 159060
messages in qmasters incoming buffer was observed with qping -info. Although
qmaster came up finally, but due to the queued up messages it took additional
time until qmaster became available for processing user requests like qstat.

   ------- Additional comments from andreas Fri Feb 15 10:31:43 -0700 2008 -------
Fixed version.

   ------- Additional comments from andreas Fri Feb 15 12:27:37 -0700 2008 -------
A related issue is #2483.

   ------- Additional comments from andreas Thu Feb 21 05:33:57 -0700 2008 -------
Related #2500 will adress the issue of queueing up messages.

   ------- Additional comments from crei Fri Apr 18 02:43:18 -0700 2008 -------
The real problem is that the reload of the spooled jobs takes so long. Changing
message acceptance at qmaster startup will produce other problems (possible
takeover of shadow deamon because shadowd thinks qmaster is down)

Change History (0)

Note: See TracTickets for help on using tickets.