[GE issues] [Issue 3248] New - Should not use adjusted load average for suspending jobs

opoplawski orion at cora.nwra.com
Tue Mar 9 17:17:01 GMT 2010


http://gridengine.sunsource.net/issues/show_bug.cgi?id=3248
                 Issue #|3248
                 Summary|Should not use adjusted load average for suspending jo
                        |bs
               Component|gridengine
                 Version|6.2u5
                Platform|PC
                     URL|
              OS/Version|Windows Vista
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P3
            Subcomponent|scheduling
             Assigned to|andreas
             Reported by|opoplawski






------- Additional comments from opoplawski at sunsource.net Tue Mar  9 09:16:56 -0800 2010 -------
It appears that the scheduler uses the adjusted load to suspend jobs.  This causes short jobs to be suspended unnecessarily.

E.g.:

compute.q at apus.cora.nwra.com   BIPC  0/3/4          2.19     lx26-amd64    
queue instance "compute.q at apus.cora.nwra.com" is in suspend alarm: np_load_avg=1.247500 (= 0.542500 + 1.0 * 2.820000 with nproc=4) >= 1.05

The machine is otherwise idle, but only 2-3 jobs are allowed to run at any given moment.  These jobs only take a minute or so to run.

Another option might be to be able to remove the load adjustment caused by a job when it exits.  This would help with load thresholds as
well as suspend thresholds.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=247722

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list