[GE users] Questions on functional policy, share tree policy, and priority queue
Dan.Templeton at Sun.COM
Tue May 27 18:11:58 BST 2008
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
Fixed a typo inline...
Daniel Templeton wrote:
> You're going to have a hard time getting SGE to do anything useful
> without understanding queues. :)
> Let's take it from the top. A queue is where a job runs, not where it
> waits to run. When a job is in the qw (queued and waiting) state, it
> has not yet been assigned to a queue. A job that has been assigned to
> a queue is in the r (running) state (or transferring or suspended).
> In the pre-6.0 days, a queue could only exist on a single host. With
> 6.0, we introduced the idea of "cluster queues". A cluster queue is a
> queue that can span multiple hosts. Under the covers, it's
> essentially a group of pre-6.0 queues, all with the same name, and
> each on the same host. With one caveat. A pre-6.0 queue is composed
> of a long list of required
Should say "... and each of a *different* host..."
> attributes, like slots, pe_list, user_list, etc. Starting with 6.0,
> that long list of attributes is only required for the cluster queue.
> All of the queue instances that belong to that cluster queue inherit
> the attribute values for it. The queue instances are allowed,
> however, to override those attribute values with local settings. A
> common example of that is the slots attribute. When you install an
> execution daemon using the install_execd script, it will add a slots
> setting for the queue instance of all.q on that host (noted as
> all.q at host). And if it wasn't already clear, pre-6.0 "queue" == 6.x
> "queue instance". Post-6.0 "queue" == "cluster queue".
> So, aside from governing the number of free slots on a host, what does
> a queue do? It controls the execution context of jobs that run in
> it. It determines what parallel environments are available, what
> file, memory, and CPU time limits should be applied, how the job
> should be started, stopped, suspended, and resumed, what the job's
> nice value is, etc. Queues also have a concept of subordination. A
> queue that is subordinated to another queue will be suspended (along
> with all the jobs running in it) when jobs are running in that other
> queue. By default, the subordinated queue will be suspended when the
> other queue is full, but you can set the number of jobs required to
> suspend the subordinated queue. 1 is a common value, meaning that the
> subordinated queue should be suspended if any jobs are running in the
> other queue. Subordination trees can be arbitrarily complex.
> Circular subordination schemes are permitted, producing a sort of
> mutual exclusion effect.
> One other oddity to point out is that the slot count for a queue is
> not really a queue attribute. It's actually a queue-level resource.
> To allow multiple queues on the same host to share that hosts CPUs
> without oversubscribing, you can set the slots resource at the host
> level. Doing so sets a host-wide slots limit, and all queues on that
> host must then share the given number of slots, regardless of how many
> slots each queue may try to offer.
> Since we're talking about resources, let's talk about one of the
> common queue/resource patterns. By default, there's nothing (other
> than access lists) to prevent a stray job from wandering into a
> queue. That's bad for queues that govern expensive resources or that
> represent special access, like a priority queue. To solve this
> problem, the most common approach is to create a resource that is
> "forced". A forced resource (one that has "FORCED" in the requestable
> column) has the property that any queue or host that offers that
> resource can only be used by jobs requesting that resource (or that
> queue or host, in which case, the resource request is implicit). By
> assigning such queues forced resources, you can guarantee that stray
> jobs can't end up in the queue. A nice side effect is that you can
> also assign an urgency to that resource, meaning that jobs requesting
> that resource (or the queue to which it's assigned) gain (or lose)
> priority when being scheduled.
> I'll follow this email up with one that specifically addresses your
> problems after I drop my daughter off with the nanny. :)
> Steve Chapel wrote:
>> Thank you for the response. I have some followup questions
>> 1. Unless I'm missing something, I don't think RQS is what I'm
>> looking for.
>> It's for "large enterprise clusters... to prevent users from
>> consuming all
>> available resources" We don't have a large enterprise cluster, and I
>> do not
>> want to prevent users for consuming all available resources. I do not
>> to set any limits on users' usage. I only want to enforce "fair"
>> usage so
>> that one user cannot queue up many jobs on the cluster and prevent
>> the other
>> two from doing useful work for days or weeks at a time. If I'm
>> missing how
>> RQS can do that (and allow one user to use all cluster resources when
>> other two are not using any), let me know. I was under the impression
>> functional policy or share tree policy are both used to ensure this
>> 2. You suggesting that I could set both weight_tickets_functional and
>> weight_tickets_share to 10000, and then set policy_hierarchy to
>> either OF or
>> OS depending on whether I want functional policy or share tree policy?
>> 3. Yes, I now see I need to set the subordinate_list of priority.q to
>> all.q=1 to ensure that the urgent jobs are run at full speed. Thanks for
>> catching that! If I didn't use that setting, would the urgent jobs
>> still be
>> guaranteed to run, but perhaps not at full speed? If so, perhaps I
>> could set
>> all.q=4 to ensure that they run faster than half speed.
>> In addition, setting queue_sort_method to seq_no in the sconf will
>> help the
>> urgent jobs pile up on a few compute nodes instead of being
>> distributed over
>> many different compute nodes, which would then suspend many all.q jobs
>> because of the previous setting. Okay, that makes sense. Or maybe it
>> as seq_no seems to decide which queue to use, not which queue
>> instance. But
>> how can I fill one queue from one side and the other queue from the
>> side? Do I set a different seq_no for each queue instance on each
>> node? I'm not sure if it really matters much, as the cluster will
>> be fully or nearly fully utilized before we attempt to suspend any jobs.
>> Some regular priority jobs will need to be suspended when high
>> priority jobs
>> get scheduled. What matters is that the high priority jobs can run
>> immediately, and it doesn't really matter if all the regular priority
>> need to be suspended for that to happen.
>> 4. We used a priority of -20 on a high-priority queue in Dan's SGE
>> at the last day of the OSGC conference. I guess that priority was just a
>> leftover of the misguided attempt to use process priorities to
>> attempt to
>> set grid engine policy. I also see that the priority does not affect
>> SGE does, and that SGE sets the priorities of processes itself.
>> I think I'm getting hopelessly confused with setting up different
>> I'll have to come back and read up on them later. I guess I just
>> don't quite
>> understand how jobs get assigned to queues (what's to prevent regular
>> from going into the priority.q?) and the difference between the queue
>> sequence and the queue instances sequence (how are jobs assigned to
>> different queues vs. how jobs are assigned to slots on compute
>> nodes). It's
>> just becoming a big blur to me. I'll punt on the whole queues thing
>> for now.
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de] Sent: Friday, May 23,
>> 2008 7:26 AM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Questions on functional policy, share tree
>> and priority queue
>> Am 21.05.2008 um 02:29 schrieb Steve Chapel:
>>> I'm new to Sun Grid Engine. I have it set up on a small Rocks cluster
>>> running SGE 6.0 for the three owners of the company I work for, but
>>> want a
>>> more fair scheduling policy than the default FIFO policy. I want to
>>> that all three owners get approximately equal access to the
>>> cluster. All
>>> jobs are single batch jobs or array batch jobs, in an embarrassingly
>>> parallel situation. Each of the compute nodes has two quad-core
>>> In particular, I want to ensure the following:
>>> 1. If there is no contention for CPUs, a user should get access to
>>> as many
>>> CPUs as they need (equal to either the number of jobs they have run
>>> or the
>>> number of CPUs, whichever is less).
>>> 2. If two or more users need to compete for CPUs, they should each
>>> get a
>>> "fair share" of CPU time.
>>> 3. In case of urgent need, we should be able to run jobs that will
>>> any currently running jobs.
>>> It seems to me that I want to set up either a functional policy or
>>> a share
>>> tree policy. In addition, I want a queue that supersedes the
>>> default all.q.
>>> I have read through the documentation, and it looks like this is
>>> what I
>>> should set up:
>>> * Add high-priority queue: qconf -aq priority.q
>>> * Add all.q to priority.q subordinate list: qconf -mattr queue
>>> subordinate_list all.q priority.q
>>> * Set priority of priority.q to highest: qconf -mattr queue
>>> priority -20
>>> * In conf, set auto_user_fshare to 100
>>> * Check that fshare is 100 for all users listed in qconf -suserl
>>> * In sconf, set halftime to 48, compensation_factor to 5
>>> * Add stree: id=0, name=default, type=0, shares=100, childnodes=NONE
>>> * For functional policy: in sconf, set weight_tickets_functional to
>>> weight_tickets_share to 0
>>> * For share tree policy: in sconf, set weight_tickets_functional to 0,
>>> weight_tickets_share to 10000
>>> Some specific questions I have on these settings are:
>>> 1. Is there a possibility that a user may not be able to use all
>>> CPUs on the cluster if there are enough jobs to use all CPUs?
>> You will need to setup RQS to define it per user:
>>> 2. Will the functional and share tree policy settings not interfere
>>> each other as long as either weight_tickets_functional or
>>> weight_tickets_share are 0?
>> You could even set in the scheduler config, to use only one of the
>> two in "policy_hierarchy" and leave one out.
>>> 3. Will submitting jobs to priority.q always immediately cause at
>>> least some
>>> currently running jobs to be suspended? Will it cause all currently
>> Well, in your setup the all.q instance on a node will be suspended
>> when *all* slots of priority.q are filled on a particular machine.
>> You could setup:
>> and as soon as one slot is used in priority.q the complete queue
>> instance on this machine is suspended. Possibly blocking 7 other
>> jobs. It's not implemented for now to suspend slots instead of
>> complete queue instances. What I would suggest: fill the cluster
>> from the one side with all.q, and from the other with priority.q to
>> minimize the effect by using sequence numbers and setting up the
>> scheduler to sort queus by seqno instead of load.
>>> jobs to be suspended if a user submits enough jobs?
>>> 4. Does setting the -20 priority of priority.q have an effect?
>> Don't do this! Nice values below zero are reserved for system tasks!
>> For user jobs only use 0 (high) to 19 (low).
>> -- Reuti
>>> 5. Are there any settings I have missed?
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users