[GE issues] [Issue 3220] New - exclusive host access prevents resource reservation for waiting jobs
ccaamad
m.c.dixon at leeds.ac.uk
Mon Jan 11 15:31:15 GMT 2010
http://gridengine.sunsource.net/issues/show_bug.cgi?id=3220
Issue #|3220
Summary|exclusive host access prevents resource reservation fo
|r waiting jobs
Component|gridengine
Version|6.2u4
Platform|PC
URL|
OS/Version|Linux
Status|NEW
Status whiteboard|
Keywords|
Resolution|
Issue type|DEFECT
Priority|P2
Subcomponent|scheduling
Assigned to|andreas
Reported by|ccaamad
------- Additional comments from ccaamad at sunsource.net Mon Jan 11 07:31:11 -0800 2010 -------
If there are jobs running with exclusive=true set, those slots are removed from consideration for waiting jobs by resource reservation.
This makes the "exclusive" feature useless to me. I wanted to use it to pack each parallel job onto the minimum number of hosts.
Look at "qstat -g c". Add-up the numbers in the "TOTAL" column. Subtract the numbers in the "cdsuE" column. Subtract the number of slots
belonging to queue instances with a host-exclusive job in them. The number you are left with is the biggest parallel job which will have
resources reserved for it. Any bigger will be starved by any waiting smaller jobs.
e.g.
Create a test cluster with a single queue and four 8-slot exec hosts. Enable exclusive job scheduling on all hosts. For illustration
purposes, disable one of the queue instances:
$ qstat -f
queuename qtype resv/used/tot. load_avg arch states
---------------------------------------------------------------------------------
smp.q at smp1.arc1.leeds.ac.uk BIP 0/0/8 0.00 lx24-amd64
---------------------------------------------------------------------------------
smp.q at smp2.arc1.leeds.ac.uk BIP 0/0/8 0.00 lx24-amd64
---------------------------------------------------------------------------------
smp.q at smp3.arc1.leeds.ac.uk BIP 0/0/8 0.00 lx24-amd64
---------------------------------------------------------------------------------
smp.q at smp4.arc1.leeds.ac.uk BIP 0/0/8 0.00 lx24-amd64 d
Submit a 14-slot host-exclusive job, and an ordinary 1-slot job:
$ qsub -clear -cwd -l h_rt=1:0:0,exclusive=true -R y -pe mpi 14 wait.sh
Your job 45 ("wait.sh") has been submitted
$ qsub -clear -cwd -l h_rt=1:0:0 -R y wait.sh
Your job 49 ("wait.sh") has been submitted
$ qstat -f
queuename qtype resv/used/tot. load_avg arch states
---------------------------------------------------------------------------------
smp.q at smp1.arc1.leeds.ac.uk BIP 0/1/8 0.00 lx24-amd64
49 0.50500 wait.sh issmcd r 01/11/2010 15:02:48 1
---------------------------------------------------------------------------------
smp.q at smp2.arc1.leeds.ac.uk BIP 0/8/8 0.00 lx24-amd64
45 0.60500 wait.sh issmcd r 01/11/2010 14:59:24 8
---------------------------------------------------------------------------------
smp.q at smp3.arc1.leeds.ac.uk BIP 0/6/8 0.00 lx24-amd64
45 0.60500 wait.sh issmcd r 01/11/2010 14:59:24 6
---------------------------------------------------------------------------------
smp.q at smp4.arc1.leeds.ac.uk BIP 0/0/8 0.00 lx24-amd64 d
Submit an 8-slot and a 9-slot job:
$ qsub -clear -cwd -l h_rt=1:0:0 -R y -pe mpi 8 wait.sh
Your job 50 ("wait.sh") has been submitted
$ qsub -clear -cwd -l h_rt=1:0:0 -R y -pe mpi 9 wait.sh
Your job 51 ("wait.sh") has been submitted
If I have MONITOR=true on in the scheduler configuration, I can see that only the 8-slot job has resources reserved for it. The 9-slot job
is left to starve.
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=238118
To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users
mailing list