Opened 8 years ago

Closed 5 years ago

#767 closed defect (fixed)

IZ3220: exclusive host access prevents resource reservation for waiting jobs

Reported by: ccaamad Owned by:
Priority: high Milestone:
Component: sge Version: 6.2u4
Severity: minor Keywords: PC Linux scheduling
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3220]

        Issue #:      3220             Platform:     PC       Reporter: ccaamad (ccaamad)
       Component:     gridengine          OS:        Linux
     Subcomponent:    scheduling       Version:      6.2u4       CC:    None defined
        Status:       NEW              Priority:     P2
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
          URL:
       * Summary:     exclusive host access prevents resource reservation for waiting jobs
   Status whiteboard:
      Attachments:

     Issue 3220 blocks:
   Votes for issue 3220:


   Opened: Mon Jan 11 08:31:00 -0700 2010 
------------------------


If there are jobs running with exclusive=true set, those slots are removed from consideration for waiting jobs by resource reservation.
This makes the "exclusive" feature useless to me. I wanted to use it to pack each parallel job onto the minimum number of hosts.

Look at "qstat -g c". Add-up the numbers in the "TOTAL" column. Subtract the numbers in the "cdsuE" column. Subtract the number of slots
belonging to queue instances with a host-exclusive job in them. The number you are left with is the biggest parallel job which will have
resources reserved for it. Any bigger will be starved by any waiting smaller jobs.

e.g.

Create a test cluster with a single queue and four 8-slot exec hosts. Enable exclusive job scheduling on all hosts. For illustration
purposes, disable one of the queue instances:

$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
smp.q@smp1.arc1.leeds.ac.uk    BIP   0/0/8          0.00     lx24-amd64
---------------------------------------------------------------------------------
smp.q@smp2.arc1.leeds.ac.uk    BIP   0/0/8          0.00     lx24-amd64
---------------------------------------------------------------------------------
smp.q@smp3.arc1.leeds.ac.uk    BIP   0/0/8          0.00     lx24-amd64
---------------------------------------------------------------------------------
smp.q@smp4.arc1.leeds.ac.uk    BIP   0/0/8          0.00     lx24-amd64    d


Submit a 14-slot host-exclusive job, and an ordinary 1-slot job:

$ qsub -clear -cwd -l h_rt=1:0:0,exclusive=true -R y -pe mpi 14 wait.sh
Your job 45 ("wait.sh") has been submitted
$ qsub -clear -cwd -l h_rt=1:0:0 -R y wait.sh
Your job 49 ("wait.sh") has been submitted
$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
smp.q@smp1.arc1.leeds.ac.uk    BIP   0/1/8          0.00     lx24-amd64
     49 0.50500 wait.sh    issmcd       r     01/11/2010 15:02:48     1
---------------------------------------------------------------------------------
smp.q@smp2.arc1.leeds.ac.uk    BIP   0/8/8          0.00     lx24-amd64
     45 0.60500 wait.sh    issmcd       r     01/11/2010 14:59:24     8
---------------------------------------------------------------------------------
smp.q@smp3.arc1.leeds.ac.uk    BIP   0/6/8          0.00     lx24-amd64
     45 0.60500 wait.sh    issmcd       r     01/11/2010 14:59:24     6
---------------------------------------------------------------------------------
smp.q@smp4.arc1.leeds.ac.uk    BIP   0/0/8          0.00     lx24-amd64    d


Submit an 8-slot and a 9-slot job:

$ qsub -clear -cwd -l h_rt=1:0:0 -R y -pe mpi 8 wait.sh
Your job 50 ("wait.sh") has been submitted
$ qsub -clear -cwd -l h_rt=1:0:0 -R y -pe mpi 9 wait.sh
Your job 51 ("wait.sh") has been submitted

If I have MONITOR=true on in the scheduler configuration, I can see that only the 8-slot job has resources reserved for it. The 9-slot job
is left to starve.

Change History (1)

comment:1 Changed 5 years ago by dlove

  • Resolution set to fixed
  • Severity set to minor
  • Status changed from new to closed

Seems to be fixed now (Mark agrees)

Note: See TracTickets for help on using tickets.