Opened 10 years ago

Closed 7 years ago

#724 closed defect (fixed)

IZ3148: jobs do not always go to the least loaded host

Reported by: petrik Owned by:
Priority: normal Milestone:
Component: sge Version: 6.2u3
Severity: minor Keywords: scheduling
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3148]

        Issue #:      3148             Platform:     All      Reporter: petrik (petrik)
       Component:     gridengine          OS:        All
     Subcomponent:    scheduling       Version:      6.2u3       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
          URL:
       * Summary:     jobs do not always go to the least loaded host
   Status whiteboard:
      Attachments:

     Issue 3148 blocks:
   Votes for issue 3148:


   Opened: Thu Oct 1 05:56:00 -0700 2009 
------------------------


This issue was brought up during the Gridengine Workshop in Regensburg Sept 2009.

How to reproduce:

Although the scenarios of the three users are slightly different, this is reproducible easily:

0) We need two exec hosts in the cluster

1) Create two queues in SGE  highprio.q and the other as lowprio.q. lowprio.q is subordinated to
   highprio.q to get suspended even when one slot is filled.

2) Create a pe and set the allocation rule to $fill_up. No starter methods. Add the pe to both the queues
   created in step 1.

4) Submit a job (e.g. sleeper) to the low prio queue and make sure you only request for enough slots
  to fill up a part of the exec hosts in the queue ( if you have two exec hosts with 4 slots each , only
  request for 4 slots when submitting the job) i.e after the job is scheduled you should have at least one
  exec host free. This will help us to figure out how the next job would be scheduled. You could just do a qrsh

  qrsh -pe pe_test 4 -q lowprio.q or
  qsub -pe pe_test 4 -q lowprio.q sleeper.sh 3000

5) After you land on one of the exec hosts, run some program which will increase the load of the exec host,
   but at the same time keep the load below the configured np_load_avg on your cluster. A while() loop will do. Or just set the load_avg=1.5
in the complex_values for the host that runs the job (qconf -me <host>). The host will then report that the load on the host is always 1.5.

6) Wait for some time until qhost reports an increased load for the exec host on which your looping job is running.

7) Now submit a job to the high priority queue using the same pe and observe where the job gets scheduled.
  The expectation is that  the job landed on the empty host where there is no load


Here is a link where this issue was already discussed:

http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=158867

Change History (1)

comment:1 Changed 7 years ago by dlove

  • Resolution set to fixed
  • Severity set to minor
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.