Opened 4 years ago

Closed 4 years ago

#1550 closed enhancement (fixed)

Better job scheduling within a sharetree node

Reported by: markdixon Owned by: Mark Dixon <m.c.dixon@…>
Priority: normal Milestone:
Component: sge Version: 8.1.8
Severity: minor Keywords:
Cc:

Description

When using the sharetree policy, jobs are assigned a priority based upon a hierarchical tree. Pending jobs located in the same sharetree node are currently sorted by a very simple algorithm - this enhancement is an attempt to help it take parallel jobs into consideration.

Hoping this will improve scheduling on a cluster with job sizes that vary by x10e3. Will be trying it out over the next few months. Presumably the functional policy might need a similar modification.

From the patch:

Enhmt #xxx sharetree node priority scaled by slot (not job) count

When a sharetree node has pending jobs, each job was assigned the
number of sharetree tickets (stcks) due to the node and then scaled
based on how many running and pending jobs that the node had ahead of
it - sum(job_ahead).

This changes it to be related to the number of assigned slots that the
node has ahead of the job - sum(job_ahead*slots).

e.g. If there are no jobs running and a single job pending, the pending
job will still receive the full number of stcks due to the node. If
there is one 8 slot job running and one pending, the pending job will
receive 1/9 of the stcks due to the node, instead of 1/2.

There are no doubt more accurate maths it could be based on, such as
something based on the usage_weight_list config option, and more accurate
measures of slots (we simply take the minimum of the first PE range in
the job request here). This is an attempt to make a 1st order correction,
allowing more complicated calculations later if necessary.

It is hoped that this change will make the sharetree policy fairer for
nodes with a job mix containing jobs with a variety of slot counts.

Feedback welcome!

Thanks,

Mark

Change History (7)

comment:1 Changed 4 years ago by Mark Dixon <m.c.dixon@…>

  • Owner set to Mark Dixon <m.c.dixon@…>
  • Resolution set to fixed
  • Status changed from new to closed

In 4838/sge:

Fix #1550: Better job scheduling within a sharetree node

Note: See TracTickets for help on using tickets.