Opened 9 years ago

Last modified 9 years ago

#786 new defect

IZ3248: Should not use adjusted load average for suspending jobs

Reported by: opoplawski Owned by:
Priority: normal Milestone:
Component: sge Version: 6.2u5
Severity: Keywords: PC Windows scheduling
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3248]

        Issue #:      3248             Platform:     PC              Reporter: opoplawski (opoplawski)
       Component:     gridengine          OS:        Windows Vista
     Subcomponent:    scheduling       Version:      6.2u5              CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
          URL:
       * Summary:     Should not use adjusted load average for suspending jobs
   Status whiteboard:
      Attachments:

     Issue 3248 blocks:
   Votes for issue 3248:


   Opened: Tue Mar 9 10:16:00 -0700 2010 
------------------------


It appears that the scheduler uses the adjusted load to suspend jobs.  This causes short jobs to be suspended unnecessarily.

E.g.:

compute.q@apus.cora.nwra.com   BIPC  0/3/4          2.19     lx26-amd64
queue instance "compute.q@apus.cora.nwra.com" is in suspend alarm: np_load_avg=1.247500 (= 0.542500 + 1.0 * 2.820000 with nproc=4) >= 1.05

The machine is otherwise idle, but only 2-3 jobs are allowed to run at any given moment.  These jobs only take a minute or so to run.

Another option might be to be able to remove the load adjustment caused by a job when it exits.  This would help with load thresholds as
well as suspend thresholds.

Change History (0)

Note: See TracTickets for help on using tickets.