[GE users] Questions on functional policy, share tree policy, and priority queue

Daniel Templeton Dan.Templeton at Sun.COM
Tue May 27 18:07:03 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

You're going to have a hard time getting SGE to do anything useful 
without understanding queues. :)

Let's take it from the top.  A queue is where a job runs, not where it 
waits to run.  When a job is in the qw (queued and waiting) state, it 
has not yet been assigned to a queue.  A job that has been assigned to a 
queue is in the r (running) state (or transferring or suspended).  In 
the pre-6.0 days, a queue could only exist on a single host.  With 6.0, 
we introduced the idea of "cluster queues".  A cluster queue is a queue 
that can span multiple hosts.  Under the covers, it's essentially a 
group of pre-6.0 queues, all with the same name, and each on the same 
host.  With one caveat.  A pre-6.0 queue is composed of a long list of 
required attributes, like slots, pe_list, user_list, etc.  Starting with 
6.0, that long list of attributes is only required for the cluster 
queue.  All of the queue instances that belong to that cluster queue 
inherit the attribute values for it.  The queue instances are allowed, 
however, to override those attribute values with local settings.  A 
common example of that is the slots attribute.  When you install an 
execution daemon using the install_execd script, it will add a slots 
setting for the queue instance of all.q on that host (noted as 
all.q at host).  And if it wasn't already clear, pre-6.0 "queue" == 6.x 
"queue instance".  Post-6.0 "queue" == "cluster queue".

So, aside from governing the number of free slots on a host, what does a 
queue do?  It controls the execution context of jobs that run in it.  It 
determines what parallel environments are available, what file, memory, 
and CPU time limits should be applied, how the job should be started, 
stopped, suspended, and resumed, what the job's nice value is, etc.  
Queues also have a concept of subordination.  A queue that is 
subordinated to another queue will be suspended (along with all the jobs 
running in it) when jobs are running in that other queue.  By default, 
the subordinated queue will be suspended when the other queue is full, 
but you can set the number of jobs required to suspend the subordinated 
queue.  1 is a common value, meaning that the subordinated queue should 
be suspended if any jobs are running in the other queue.  Subordination 
trees can be arbitrarily complex.  Circular subordination schemes are 
permitted, producing a sort of mutual exclusion effect.

One other oddity to point out is that the slot count for a queue is not 
really a queue attribute.  It's actually a queue-level resource.  To 
allow multiple queues on the same host to share that hosts CPUs without 
oversubscribing, you can set the slots resource at the host level.  
Doing so sets a host-wide slots limit, and all queues on that host must 
then share the given number of slots, regardless of how many slots each 
queue may try to offer.

Since we're talking about resources, let's talk about one of the common 
queue/resource patterns.  By default, there's nothing (other than access 
lists) to prevent a stray job from wandering into a queue.  That's bad 
for queues that govern expensive resources or that represent special 
access, like a priority queue.  To solve this problem, the most common 
approach is to create a resource that is "forced".  A forced resource 
(one that has "FORCED" in the requestable column) has the property that 
any queue or host that offers that resource can only be used by jobs 
requesting that resource (or that queue or host, in which case, the 
resource request is implicit).  By assigning such queues forced 
resources, you can guarantee that stray jobs can't end up in the queue.  
A nice side effect is that you can also assign an urgency to that 
resource, meaning that jobs requesting that resource (or the queue to 
which it's assigned) gain (or lose) priority when being scheduled.

I'll follow this email up with one that specifically addresses your 
problems after I drop my daughter off with the nanny. :)

Daniel


Steve Chapel wrote:
> Thank you for the response. I have some followup questions
>
> 1. Unless I'm missing something, I don't think RQS is what I'm looking for.
> It's for "large enterprise clusters... to prevent users from consuming all
> available resources" We don't have a large enterprise cluster, and I do not
> want to prevent users for consuming all available resources. I do not want
> to set any limits on users' usage. I only want to enforce "fair" usage so
> that one user cannot queue up many jobs on the cluster and prevent the other
> two from doing useful work for days or weeks at a time. If I'm missing how
> RQS can do that (and allow one user to use all cluster resources when the
> other two are not using any), let me know. I was under the impression that
> functional policy or share tree policy are both used to ensure this
> "fairness".
>
> 2. You suggesting that I could set both weight_tickets_functional and
> weight_tickets_share to 10000, and then set policy_hierarchy to either OF or
> OS depending on whether I want functional policy or share tree policy?
>
> 3. Yes, I now see I need to set the subordinate_list of priority.q to
> all.q=1 to ensure that the urgent jobs are run at full speed. Thanks for
> catching that! If I didn't use that setting, would the urgent jobs still be
> guaranteed to run, but perhaps not at full speed? If so, perhaps I could set
> all.q=4 to ensure that they run faster than half speed.
>
> In addition, setting queue_sort_method to seq_no in the sconf will help the
> urgent jobs pile up on a few compute nodes instead of being distributed over
> many different compute nodes, which would then suspend many all.q jobs
> because of the previous setting. Okay, that makes sense. Or maybe it doesn't
> as seq_no seems to decide which queue to use, not which queue instance. But
> how can I fill one queue from one side and the other queue from the other
> side? Do I set a different seq_no for each queue instance on each compute
> node? I'm not sure if it really matters much, as the cluster will probably
> be fully or nearly fully utilized before we attempt to suspend any jobs.
> Some regular priority jobs will need to be suspended when high priority jobs
> get scheduled. What matters is that the high priority jobs can run
> immediately, and it doesn't really matter if all the regular priority jobs
> need to be suspended for that to happen.
>
> 4. We used a priority of -20 on a high-priority queue in Dan's SGE workshop
> at the last day of the OSGC conference. I guess that priority was just a
> leftover of the misguided attempt to use process priorities to attempt to
> set grid engine policy. I also see that the priority does not affect what
> SGE does, and that SGE sets the priorities of processes itself.
>
> I think I'm getting hopelessly confused with setting up different queues.
> I'll have to come back and read up on them later. I guess I just don't quite
> understand how jobs get assigned to queues (what's to prevent regular jobs
> from going into the priority.q?) and the difference between the queue
> sequence and the queue instances sequence (how are jobs assigned to
> different queues vs. how jobs are assigned to slots on compute nodes). It's
> just becoming a big blur to me. I'll punt on the whole queues thing for now.
>
> Thanks,
> Steve
>
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de] 
> Sent: Friday, May 23, 2008 7:26 AM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Questions on functional policy, share tree policy,
> and priority queue
>
> Am 21.05.2008 um 02:29 schrieb Steve Chapel:
>
>   
>> Hi,
>>
>> I'm new to Sun Grid Engine. I have it set up on a small Rocks cluster
>> running SGE 6.0 for the three owners of the company I work for, but  
>> want a
>> more fair scheduling policy than the default FIFO policy. I want to  
>> ensure
>> that all three owners get approximately equal access to the  
>> cluster. All
>> jobs are single batch jobs or array batch jobs, in an embarrassingly
>> parallel situation. Each of the compute nodes has two quad-core  
>> processors.
>>
>> In particular, I want to ensure the following:
>> 1. If there is no contention for CPUs, a user should get access to  
>> as many
>> CPUs as they need (equal to either the number of jobs they have run  
>> or the
>> number of CPUs, whichever is less).
>> 2. If two or more users need to compete for CPUs, they should each  
>> get a
>> "fair share" of CPU time.
>> 3. In case of urgent need, we should be able to run jobs that will  
>> preempt
>> any currently running jobs.
>>
>> It seems to me that I want to set up either a functional policy or  
>> a share
>> tree policy. In addition, I want a queue that supersedes the  
>> default all.q.
>> I have read through the documentation, and it looks like this is  
>> what I
>> should set up:
>> * Add high-priority queue: qconf -aq priority.q
>> * Add all.q to priority.q subordinate list: qconf -mattr queue
>> subordinate_list all.q priority.q
>> * Set priority of priority.q to highest: qconf -mattr queue  
>> priority -20
>> priority.q
>> * In conf, set auto_user_fshare to 100
>> * Check that fshare is 100 for all users listed in qconf -suserl
>> * In sconf, set halftime to 48, compensation_factor to 5
>> * Add stree: id=0, name=default, type=0, shares=100, childnodes=NONE
>> * For functional policy: in sconf, set weight_tickets_functional to  
>> 10000,
>> weight_tickets_share to 0
>> * For share tree policy: in sconf, set weight_tickets_functional to 0,
>> weight_tickets_share to 10000
>>
>> Some specific questions I have on these settings are:
>> 1. Is there a possibility that a user may not be able to use all  
>> available
>> CPUs on the cluster if there are enough jobs to use all CPUs?
>>     
>
> You will need to setup RQS to define it per user:
>
> http://gridengine.sunsource.net/nonav/source/browse/~checkout~/ 
> gridengine/doc/devel/rfe/ResourceQuotaSpecification.html
>
>   
>> 2. Will the functional and share tree policy settings not interfere  
>> with
>> each other as long as either weight_tickets_functional or
>> weight_tickets_share are 0?
>>     
>
> You could even set in the scheduler config, to use only one of the  
> two in "policy_hierarchy" and leave one out.
>
>   
>> 3. Will submitting jobs to priority.q always immediately cause at  
>> least some
>> currently running jobs to be suspended? Will it cause all currently  
>> running
>>     
>
> Well, in your setup the all.q instance on a node will be suspended  
> when *all* slots of priority.q are filled on a particular machine.  
> You could setup:
>
> all.q=1
>
> and as soon as one slot is used in priority.q the complete queue  
> instance on this machine is suspended. Possibly blocking 7 other  
> jobs. It's not implemented for now to suspend slots instead of  
> complete queue instances. What I would suggest: fill the cluster from  
> the one side with all.q, and from the other with priority.q to  
> minimize the effect by using sequence numbers and setting up the  
> scheduler to sort queus by seqno instead of load.
>
>   
>> jobs to be suspended if a user submits enough jobs?
>> 4. Does setting the -20 priority of priority.q have an effect?
>>     
>
> Don't do this! Nice values below zero are reserved for system tasks!  
> For user jobs only use 0 (high) to 19 (low).
>
> -- Reuti
>
>
>   
>> 5. Are there any settings I have missed?
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list