Opened 12 years ago

Last modified 9 years ago

#418 new enhancement

IZ2226: Testsuite needs relevant performance test for scheduler dispatch times with resource quotas

Reported by: andreas Owned by:
Priority: normal Milestone:
Component: sge Version: 6.1beta2
Severity: Keywords: testsuite
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2226]

        Issue #:      2226             Platform:     All           Reporter: andreas (andreas)
       Component:     gridengine          OS:        All
     Subcomponent:    testsuite        Version:      6.1beta2         CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    ENHANCEMENT
                                   Target milestone: ---
      Assigned to:    joga (joga)
      QA Contact:     joga
          URL:
       * Summary:     Testsuite needs relevant performance test for scheduler dispatch times with resource quotas
   Status whiteboard:
      Attachments:

     Issue 2226 blocks:
   Votes for issue 2226:


   Opened: Thu Mar 29 08:50:00 -0700 2007 
------------------------


DESCRIPTION:
Configure 100 queues Q001-Q100. The queues should have no load thresholds,
4 slots per node and be available for @allhosts. Use at least 4 execution
hosts as to get a halfway realistic performance behavoir. Enable scheduler
profiling with sched_conf(5) params setting PROFILE=true. Configure five
projects Project1-Project5. Configure 10 INT consumable resources F001-F010
and with each of them having a global capacity of 100. Configure a resource
quota that limits use of F001-F010 to 1 per project

  limit  projects {*} to
F001=1,F002=1,F003=1,F004=1,F005=1,F006=1,F007=1,F008=1,F009=1,F010=1

before job submission disable all queues using qmod -d "*" and remove the
sge_schedd messages file.

Then submit 1000 sequential jobs: For each of the five projects submit a series of
20 identical jobs requesting -l F001=1 to -l F010=1 (5*10*20 = 1000). Jobs can
be normal sleeper jobs that remain 5 minutes pending or even more.

When all jobs are submitted enable all queues using qmod -e "*" and record the
first 'n' schedd profiling messages with job dispatching time contained using

   # grep "job dispatching took" $SGE_ROOT/default/spool/qmaster/schedd/messages
| head -7

Before the fix the follwing numbers were typical

   03/28/2007 17:49:59|schedd|es-ergb01-01|P|PROF: job dispatching took 3.840 s
(1000 fast, 0 comp, 0 pe, 0 res)
   03/28/2007 17:50:05|schedd|es-ergb01-01|P|PROF: job dispatching took 2.180 s
(950 fast, 0 comp, 0 pe, 0 res)
   03/28/2007 17:50:10|schedd|es-ergb01-01|P|PROF: job dispatching took 2.160 s
(950 fast, 0 comp, 0 pe, 0 res)
   03/28/2007 17:50:15|schedd|es-ergb01-01|P|PROF: job dispatching took 2.170 s
(950 fast, 0 comp, 0 pe, 0 res)
   03/28/2007 17:50:20|schedd|es-ergb01-01|P|PROF: job dispatching took 2.130 s
(950 fast, 0 comp, 0 pe, 0 res)
   03/28/2007 17:50:26|schedd|es-ergb01-01|P|PROF: job dispatching took 2.170 s
(950 fast, 0 comp, 0 pe, 0 res)
   03/28/2007 17:50:31|schedd|es-ergb01-01|P|PROF: job dispatching took 2.170 s
(950 fast, 0 comp, 0 pe, 0 res)

after the fix these numbers are typical

   03/29/2007 16:24:10|schedd|es-ergb01-01|P|PROF: job dispatching took 1.580 s
(1000 fast, 0 comp, 0 pe, 0 res)
   03/29/2007 16:24:12|schedd|es-ergb01-01|P|PROF: job dispatching took 0.020 s
(950 fast, 0 comp, 0 pe, 0 res)
   03/29/2007 16:24:17|schedd|es-ergb01-01|P|PROF: job dispatching took 0.020 s
(950 fast, 0 comp, 0 pe, 0 res)
   03/29/2007 16:24:22|schedd|es-ergb01-01|P|PROF: job dispatching took 0.030 s
(950 fast, 0 comp, 0 pe, 0 res)
   03/29/2007 16:24:27|schedd|es-ergb01-01|P|PROF: job dispatching took 0.030 s
(950 fast, 0 comp, 0 pe, 0 res)
   03/29/2007 16:24:32|schedd|es-ergb01-01|P|PROF: job dispatching took 0.030 s
(950 fast, 0 comp, 0 pe, 0 res)
   03/29/2007 16:24:37|schedd|es-ergb01-01|P|PROF: job dispatching took 0.030 s
(950 fast, 0 comp, 0 pe, 0 res)

   ------- Additional comments from andreas Thu Mar 29 09:53:19 -0700 2007 -------
A worthwhile variarion of this test is to (a) send all jobs only into a single
queue Q001, (b) increase slot amount of Q001 to 25 so that still 50 jobs can be
dispatched at a time, and (c) use a limitation rule that applies only to Q001

  limit  queues Q001 projects {*} to
F001=1,F002=1,F003=1,F004=1,F005=1,F006=1,F007=1,F008=1,F009=1,F010=1

as for sending jobs into queue Q001 various possibilities exist:

(1) request "-q Q001"
(2) attach a user defined string complex attribute 'type' to all queues with
type=<qname> and have jobs request "-l type=Q001"
(3) attach a user defined int complex attribute 'number' to all queues with
number='queue-number' and have jobs request "-l number=001"

Change History (0)

Note: See TracTickets for help on using tickets.