Opened 11 years ago
Closed 9 years ago
#724 closed defect (fixed)
IZ3148: jobs do not always go to the least loaded host
Reported by: | petrik | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | sge | Version: | 6.2u3 |
Severity: | minor | Keywords: | scheduling |
Cc: |
Description
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3148]
Issue #: 3148 Platform: All Reporter: petrik (petrik) Component: gridengine OS: All Subcomponent: scheduling Version: 6.2u3 CC: None defined Status: NEW Priority: P3 Resolution: Issue type: DEFECT Target milestone: --- Assigned to: andreas (andreas) QA Contact: andreas URL: * Summary: jobs do not always go to the least loaded host Status whiteboard: Attachments: Issue 3148 blocks: Votes for issue 3148: Opened: Thu Oct 1 05:56:00 -0700 2009 ------------------------ This issue was brought up during the Gridengine Workshop in Regensburg Sept 2009. How to reproduce: Although the scenarios of the three users are slightly different, this is reproducible easily: 0) We need two exec hosts in the cluster 1) Create two queues in SGE highprio.q and the other as lowprio.q. lowprio.q is subordinated to highprio.q to get suspended even when one slot is filled. 2) Create a pe and set the allocation rule to $fill_up. No starter methods. Add the pe to both the queues created in step 1. 4) Submit a job (e.g. sleeper) to the low prio queue and make sure you only request for enough slots to fill up a part of the exec hosts in the queue ( if you have two exec hosts with 4 slots each , only request for 4 slots when submitting the job) i.e after the job is scheduled you should have at least one exec host free. This will help us to figure out how the next job would be scheduled. You could just do a qrsh qrsh -pe pe_test 4 -q lowprio.q or qsub -pe pe_test 4 -q lowprio.q sleeper.sh 3000 5) After you land on one of the exec hosts, run some program which will increase the load of the exec host, but at the same time keep the load below the configured np_load_avg on your cluster. A while() loop will do. Or just set the load_avg=1.5 in the complex_values for the host that runs the job (qconf -me <host>). The host will then report that the load on the host is always 1.5. 6) Wait for some time until qhost reports an increased load for the exec host on which your looping job is running. 7) Now submit a job to the high priority queue using the same pe and observe where the job gets scheduled. The expectation is that the job landed on the empty host where there is no load Here is a link where this issue was already discussed: http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=158867
Change History (1)
comment:1 Changed 9 years ago by dlove
- Resolution set to fixed
- Severity set to minor
- Status changed from new to closed
Note: See
TracTickets for help on using
tickets.