Custom Query (431 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (37 - 39 of 431)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Ticket Resolution Summary Owner Reporter
#1550 fixed Better job scheduling within a sharetree node Mark Dixon <m.c.dixon@…> markdixon
Description

When using the sharetree policy, jobs are assigned a priority based upon a hierarchical tree. Pending jobs located in the same sharetree node are currently sorted by a very simple algorithm - this enhancement is an attempt to help it take parallel jobs into consideration.

Hoping this will improve scheduling on a cluster with job sizes that vary by x10e3. Will be trying it out over the next few months. Presumably the functional policy might need a similar modification.

From the patch:

Enhmt #xxx sharetree node priority scaled by slot (not job) count

When a sharetree node has pending jobs, each job was assigned the number of sharetree tickets (stcks) due to the node and then scaled based on how many running and pending jobs that the node had ahead of it - sum(job_ahead).

This changes it to be related to the number of assigned slots that the node has ahead of the job - sum(job_ahead*slots).

e.g. If there are no jobs running and a single job pending, the pending job will still receive the full number of stcks due to the node. If there is one 8 slot job running and one pending, the pending job will receive 1/9 of the stcks due to the node, instead of 1/2.

There are no doubt more accurate maths it could be based on, such as something based on the usage_weight_list config option, and more accurate measures of slots (we simply take the minimum of the first PE range in the job request here). This is an attempt to make a 1st order correction, allowing more complicated calculations later if necessary.

It is hoped that this change will make the sharetree policy fairer for nodes with a job mix containing jobs with a variety of slot counts.

Feedback welcome!

Thanks,

Mark

#1549 fixed Project usage is not saved across qmaster restarts Mark Dixon <m.c.dixon@…> markdixon
Description

Hi,

A restart of the qmaster throws away sharetree project usage. This is because project usage is stored in the spool by user objects and not project objects.

The attached patch initialises project usage by walking through the user objects.

It's only been tested against 8.1.5, but this patch has been prepared against 8.1.8 (and checked that it compiles ok).

Cheers,

Mark

#1546 fixed qsub -terse option performs oddly with non critical errors agrothberg
Description

I am trying to submit a job using the -terse option:

$ qsub -terse  -S /usr/bin/python hello_world_delay.py 1> /dev/null
Unable to run job: warning: ec2-user's job is not allowed to run in any queue
Your job 33 ("hello_world_delay.py") has been submitted
Exiting.

The job IS actually submitted but terse does not work correctly.

The entire message is alo sent to standard err (this produces no output):

$ qsub -terse  -S /usr/bin/python hello_world_delay.py 2> /dev/null
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Note: See TracQuery for help on using queries.