Custom Query (431 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (109 - 111 of 431)

Ticket Resolution Summary Owner Reporter
#1549 fixed Project usage is not saved across qmaster restarts Mark Dixon <m.c.dixon@…> markdixon
Description

Hi,

A restart of the qmaster throws away sharetree project usage. This is because project usage is stored in the spool by user objects and not project objects.

The attached patch initialises project usage by walking through the user objects.

It's only been tested against 8.1.5, but this patch has been prepared against 8.1.8 (and checked that it compiles ok).

Cheers,

Mark

#1550 fixed Better job scheduling within a sharetree node Mark Dixon <m.c.dixon@…> markdixon
Description

When using the sharetree policy, jobs are assigned a priority based upon a hierarchical tree. Pending jobs located in the same sharetree node are currently sorted by a very simple algorithm - this enhancement is an attempt to help it take parallel jobs into consideration.

Hoping this will improve scheduling on a cluster with job sizes that vary by x10e3. Will be trying it out over the next few months. Presumably the functional policy might need a similar modification.

From the patch:

Enhmt #xxx sharetree node priority scaled by slot (not job) count

When a sharetree node has pending jobs, each job was assigned the number of sharetree tickets (stcks) due to the node and then scaled based on how many running and pending jobs that the node had ahead of it - sum(job_ahead).

This changes it to be related to the number of assigned slots that the node has ahead of the job - sum(job_ahead*slots).

e.g. If there are no jobs running and a single job pending, the pending job will still receive the full number of stcks due to the node. If there is one 8 slot job running and one pending, the pending job will receive 1/9 of the stcks due to the node, instead of 1/2.

There are no doubt more accurate maths it could be based on, such as something based on the usage_weight_list config option, and more accurate measures of slots (we simply take the minimum of the first PE range in the job request here). This is an attempt to make a 1st order correction, allowing more complicated calculations later if necessary.

It is hoped that this change will make the sharetree policy fairer for nodes with a job mix containing jobs with a variety of slot counts.

Feedback welcome!

Thanks,

Mark

#1551 fixed Spool not flushed at qmaster exit Mark Dixon <m.c.dixon@…> markdixon
Description

The qmaster takes great pains to flush out all the spool objects when it does a normal exit. This is important because gridengine also tries hard to rate-limit the frequency of object updates, meaning some little-used objects can be permanently out of date.

However, it also takes great pains NOT to flush the spool objects if it notices that another qmaster has fiddled with the files - to avoid file corruption.

Unfortunately, an "if" test is reversed, so it only actually flushes in the condition of maximal chance of file corruption and doesn't otherwise.

Patch follows to correct this, prepared against 8.1.8.

As this code path has very rarely been used, probably worth testing it a bit before putting into production!

Note: See TracQuery for help on using queries.