[GE issues] [Issue 3220] New - exclusive host access prevents resource reservation for waiting jobs

ccaamad m.c.dixon at leeds.ac.uk
Mon Jan 11 15:31:15 GMT 2010


http://gridengine.sunsource.net/issues/show_bug.cgi?id=3220
                 Issue #|3220
                 Summary|exclusive host access prevents resource reservation fo
                        |r waiting jobs
               Component|gridengine
                 Version|6.2u4
                Platform|PC
                     URL|
              OS/Version|Linux
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P2
            Subcomponent|scheduling
             Assigned to|andreas
             Reported by|ccaamad






------- Additional comments from ccaamad at sunsource.net Mon Jan 11 07:31:11 -0800 2010 -------
If there are jobs running with exclusive=true set, those slots are removed from consideration for waiting jobs by resource reservation.
This makes the "exclusive" feature useless to me. I wanted to use it to pack each parallel job onto the minimum number of hosts.

Look at "qstat -g c". Add-up the numbers in the "TOTAL" column. Subtract the numbers in the "cdsuE" column. Subtract the number of slots
belonging to queue instances with a host-exclusive job in them. The number you are left with is the biggest parallel job which will have
resources reserved for it. Any bigger will be starved by any waiting smaller jobs.

e.g.

Create a test cluster with a single queue and four 8-slot exec hosts. Enable exclusive job scheduling on all hosts. For illustration
purposes, disable one of the queue instances:

$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
smp.q at smp1.arc1.leeds.ac.uk    BIP   0/0/8          0.00     lx24-amd64
---------------------------------------------------------------------------------
smp.q at smp2.arc1.leeds.ac.uk    BIP   0/0/8          0.00     lx24-amd64
---------------------------------------------------------------------------------
smp.q at smp3.arc1.leeds.ac.uk    BIP   0/0/8          0.00     lx24-amd64
---------------------------------------------------------------------------------
smp.q at smp4.arc1.leeds.ac.uk    BIP   0/0/8          0.00     lx24-amd64    d


Submit a 14-slot host-exclusive job, and an ordinary 1-slot job:

$ qsub -clear -cwd -l h_rt=1:0:0,exclusive=true -R y -pe mpi 14 wait.sh
Your job 45 ("wait.sh") has been submitted
$ qsub -clear -cwd -l h_rt=1:0:0 -R y wait.sh
Your job 49 ("wait.sh") has been submitted
$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
smp.q at smp1.arc1.leeds.ac.uk    BIP   0/1/8          0.00     lx24-amd64
     49 0.50500 wait.sh    issmcd       r     01/11/2010 15:02:48     1
---------------------------------------------------------------------------------
smp.q at smp2.arc1.leeds.ac.uk    BIP   0/8/8          0.00     lx24-amd64
     45 0.60500 wait.sh    issmcd       r     01/11/2010 14:59:24     8
---------------------------------------------------------------------------------
smp.q at smp3.arc1.leeds.ac.uk    BIP   0/6/8          0.00     lx24-amd64
     45 0.60500 wait.sh    issmcd       r     01/11/2010 14:59:24     6
---------------------------------------------------------------------------------
smp.q at smp4.arc1.leeds.ac.uk    BIP   0/0/8          0.00     lx24-amd64    d


Submit an 8-slot and a 9-slot job:

$ qsub -clear -cwd -l h_rt=1:0:0 -R y -pe mpi 8 wait.sh
Your job 50 ("wait.sh") has been submitted
$ qsub -clear -cwd -l h_rt=1:0:0 -R y -pe mpi 9 wait.sh
Your job 51 ("wait.sh") has been submitted

If I have MONITOR=true on in the scheduler configuration, I can see that only the 8-slot job has resources reserved for it. The 9-slot job
is left to starve.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=238118

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list