[GE users] High and low priority queues, no starvation, minimum oversubscription
agrajag at dragaera.net
Mon Mar 6 14:53:37 GMT 2006
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
Ian Fasel wrote:
> I'm trying to make a design where everyone gets a preferred high
> priority machine, but then shares equally for the remaining machines.
> I also want to prevent jobs from getting completely suspended by
> higher-priority jobs.
> I have almost done this. Each user has 2 queues, a high-prio (HP) and
> a low-prio (LP) queue. Each queue has two slots per host. On the HP
> queue, you get your preferred host. It has sequence number 10. Your
> LP queue includes all the remaining hosts, and it is set to niceness
> 20, and sequence number 50. Finally, there is a total_slots consumable
> resource, and each host is configured to have 2 total_slots, and each
> job requests one of these by default. Scheduling is done using
> sequence number. We also have share tickets so scheduling of jobs is
I think my cluster is setup in a similar way to what you want.
I have lowprio.q and highprio.q on all my machines. Using host groups
and user groups, I set the 'user_lists' for highprio.q so that only
people in a certain group can use the high priority queues on certain
nodes. (Specifically, these are the nodes they purchased)
I've turned off reprioritization and set the nice value of lowprio.q to
19, and 0 on highprio.q. I've not done anything with sequence numbers.
Rather, I created a complex that looks like this:
highprio hiprio BOOL == FORCED
NO 0 100000
And in highprio.q I set:
This way, if a user submits a job with '-l highprio', then it runs on
their high priority queues, otherwise it runs on their low priority
queues. This causes the user to make a decision on if they want to run
on high or low priority queues. Also, if someone has 20 slots across
their high priority queues and submit 40 single cpu jobs, then 20 of
those jobs will sit in the queue waiting.
In order to keep nodes from getting overrun, on lowprio.q I set:
However, the load_thresholds for highprio.q are set to 'NONE'.
This way if a node is running two low priority jobs, a high priority job
can still be run on that node. But if a node is running high priority
jobs, then a low priority job won't be run on it.
Also note the urgency setting for the 'highprio' complex. This is tuned
against the weight_ settings in sched_conf so that any high priority job
will always be scheduled before any low priority job. This helps avoid
the situation where a low priority job is submitted to a host and in the
same scheduling run high priority jobs are submitted to the box. (This
also requires load_adjustments to be setup)
Hopefully this will help you out or at least give you some ideas. If
you have any questions on this, please let me know.
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users