Opened 11 years ago
Closed 8 years ago
#767 closed defect (fixed)
IZ3220: exclusive host access prevents resource reservation for waiting jobs
Reported by: | ccaamad | Owned by: | |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | sge | Version: | 6.2u4 |
Severity: | minor | Keywords: | PC Linux scheduling |
Cc: |
Description
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3220]
Issue #: 3220 Platform: PC Reporter: ccaamad (ccaamad) Component: gridengine OS: Linux Subcomponent: scheduling Version: 6.2u4 CC: None defined Status: NEW Priority: P2 Resolution: Issue type: DEFECT Target milestone: --- Assigned to: andreas (andreas) QA Contact: andreas URL: * Summary: exclusive host access prevents resource reservation for waiting jobs Status whiteboard: Attachments: Issue 3220 blocks: Votes for issue 3220: Opened: Mon Jan 11 08:31:00 -0700 2010 ------------------------ If there are jobs running with exclusive=true set, those slots are removed from consideration for waiting jobs by resource reservation. This makes the "exclusive" feature useless to me. I wanted to use it to pack each parallel job onto the minimum number of hosts. Look at "qstat -g c". Add-up the numbers in the "TOTAL" column. Subtract the numbers in the "cdsuE" column. Subtract the number of slots belonging to queue instances with a host-exclusive job in them. The number you are left with is the biggest parallel job which will have resources reserved for it. Any bigger will be starved by any waiting smaller jobs. e.g. Create a test cluster with a single queue and four 8-slot exec hosts. Enable exclusive job scheduling on all hosts. For illustration purposes, disable one of the queue instances: $ qstat -f queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- smp.q@smp1.arc1.leeds.ac.uk BIP 0/0/8 0.00 lx24-amd64 --------------------------------------------------------------------------------- smp.q@smp2.arc1.leeds.ac.uk BIP 0/0/8 0.00 lx24-amd64 --------------------------------------------------------------------------------- smp.q@smp3.arc1.leeds.ac.uk BIP 0/0/8 0.00 lx24-amd64 --------------------------------------------------------------------------------- smp.q@smp4.arc1.leeds.ac.uk BIP 0/0/8 0.00 lx24-amd64 d Submit a 14-slot host-exclusive job, and an ordinary 1-slot job: $ qsub -clear -cwd -l h_rt=1:0:0,exclusive=true -R y -pe mpi 14 wait.sh Your job 45 ("wait.sh") has been submitted $ qsub -clear -cwd -l h_rt=1:0:0 -R y wait.sh Your job 49 ("wait.sh") has been submitted $ qstat -f queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- smp.q@smp1.arc1.leeds.ac.uk BIP 0/1/8 0.00 lx24-amd64 49 0.50500 wait.sh issmcd r 01/11/2010 15:02:48 1 --------------------------------------------------------------------------------- smp.q@smp2.arc1.leeds.ac.uk BIP 0/8/8 0.00 lx24-amd64 45 0.60500 wait.sh issmcd r 01/11/2010 14:59:24 8 --------------------------------------------------------------------------------- smp.q@smp3.arc1.leeds.ac.uk BIP 0/6/8 0.00 lx24-amd64 45 0.60500 wait.sh issmcd r 01/11/2010 14:59:24 6 --------------------------------------------------------------------------------- smp.q@smp4.arc1.leeds.ac.uk BIP 0/0/8 0.00 lx24-amd64 d Submit an 8-slot and a 9-slot job: $ qsub -clear -cwd -l h_rt=1:0:0 -R y -pe mpi 8 wait.sh Your job 50 ("wait.sh") has been submitted $ qsub -clear -cwd -l h_rt=1:0:0 -R y -pe mpi 9 wait.sh Your job 51 ("wait.sh") has been submitted If I have MONITOR=true on in the scheduler configuration, I can see that only the 8-slot job has resources reserved for it. The 9-slot job is left to starve.
Change History (1)
comment:1 Changed 8 years ago by dlove
- Resolution set to fixed
- Severity set to minor
- Status changed from new to closed
Note: See
TracTickets for help on using
tickets.
Seems to be fixed now (Mark agrees)