[GE users] Questions on functional policy, share tree policy, and priority queue

Daniel Templeton Dan.Templeton at Sun.COM
Tue May 27 18:11:58 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Fixed a typo inline...

Daniel Templeton wrote:
> You're going to have a hard time getting SGE to do anything useful 
> without understanding queues. :)
>
> Let's take it from the top.  A queue is where a job runs, not where it 
> waits to run.  When a job is in the qw (queued and waiting) state, it 
> has not yet been assigned to a queue.  A job that has been assigned to 
> a queue is in the r (running) state (or transferring or suspended).  
> In the pre-6.0 days, a queue could only exist on a single host.  With 
> 6.0, we introduced the idea of "cluster queues".  A cluster queue is a 
> queue that can span multiple hosts.  Under the covers, it's 
> essentially a group of pre-6.0 queues, all with the same name, and 
> each on the same host.  With one caveat.  A pre-6.0 queue is composed 
> of a long list of required

Should say "... and each of a *different* host..."

> attributes, like slots, pe_list, user_list, etc.  Starting with 6.0, 
> that long list of attributes is only required for the cluster queue.  
> All of the queue instances that belong to that cluster queue inherit 
> the attribute values for it.  The queue instances are allowed, 
> however, to override those attribute values with local settings.  A 
> common example of that is the slots attribute.  When you install an 
> execution daemon using the install_execd script, it will add a slots 
> setting for the queue instance of all.q on that host (noted as 
> all.q at host).  And if it wasn't already clear, pre-6.0 "queue" == 6.x 
> "queue instance".  Post-6.0 "queue" == "cluster queue".
>
> So, aside from governing the number of free slots on a host, what does 
> a queue do?  It controls the execution context of jobs that run in 
> it.  It determines what parallel environments are available, what 
> file, memory, and CPU time limits should be applied, how the job 
> should be started, stopped, suspended, and resumed, what the job's 
> nice value is, etc.  Queues also have a concept of subordination.  A 
> queue that is subordinated to another queue will be suspended (along 
> with all the jobs running in it) when jobs are running in that other 
> queue.  By default, the subordinated queue will be suspended when the 
> other queue is full, but you can set the number of jobs required to 
> suspend the subordinated queue.  1 is a common value, meaning that the 
> subordinated queue should be suspended if any jobs are running in the 
> other queue.  Subordination trees can be arbitrarily complex.  
> Circular subordination schemes are permitted, producing a sort of 
> mutual exclusion effect.
>
> One other oddity to point out is that the slot count for a queue is 
> not really a queue attribute.  It's actually a queue-level resource.  
> To allow multiple queues on the same host to share that hosts CPUs 
> without oversubscribing, you can set the slots resource at the host 
> level.  Doing so sets a host-wide slots limit, and all queues on that 
> host must then share the given number of slots, regardless of how many 
> slots each queue may try to offer.
>
> Since we're talking about resources, let's talk about one of the 
> common queue/resource patterns.  By default, there's nothing (other 
> than access lists) to prevent a stray job from wandering into a 
> queue.  That's bad for queues that govern expensive resources or that 
> represent special access, like a priority queue.  To solve this 
> problem, the most common approach is to create a resource that is 
> "forced".  A forced resource (one that has "FORCED" in the requestable 
> column) has the property that any queue or host that offers that 
> resource can only be used by jobs requesting that resource (or that 
> queue or host, in which case, the resource request is implicit).  By 
> assigning such queues forced resources, you can guarantee that stray 
> jobs can't end up in the queue.  A nice side effect is that you can 
> also assign an urgency to that resource, meaning that jobs requesting 
> that resource (or the queue to which it's assigned) gain (or lose) 
> priority when being scheduled.
>
> I'll follow this email up with one that specifically addresses your 
> problems after I drop my daughter off with the nanny. :)
>
> Daniel
>
>
> Steve Chapel wrote:
>> Thank you for the response. I have some followup questions
>>
>> 1. Unless I'm missing something, I don't think RQS is what I'm 
>> looking for.
>> It's for "large enterprise clusters... to prevent users from 
>> consuming all
>> available resources" We don't have a large enterprise cluster, and I 
>> do not
>> want to prevent users for consuming all available resources. I do not 
>> want
>> to set any limits on users' usage. I only want to enforce "fair" 
>> usage so
>> that one user cannot queue up many jobs on the cluster and prevent 
>> the other
>> two from doing useful work for days or weeks at a time. If I'm 
>> missing how
>> RQS can do that (and allow one user to use all cluster resources when 
>> the
>> other two are not using any), let me know. I was under the impression 
>> that
>> functional policy or share tree policy are both used to ensure this
>> "fairness".
>>
>> 2. You suggesting that I could set both weight_tickets_functional and
>> weight_tickets_share to 10000, and then set policy_hierarchy to 
>> either OF or
>> OS depending on whether I want functional policy or share tree policy?
>>
>> 3. Yes, I now see I need to set the subordinate_list of priority.q to
>> all.q=1 to ensure that the urgent jobs are run at full speed. Thanks for
>> catching that! If I didn't use that setting, would the urgent jobs 
>> still be
>> guaranteed to run, but perhaps not at full speed? If so, perhaps I 
>> could set
>> all.q=4 to ensure that they run faster than half speed.
>>
>> In addition, setting queue_sort_method to seq_no in the sconf will 
>> help the
>> urgent jobs pile up on a few compute nodes instead of being 
>> distributed over
>> many different compute nodes, which would then suspend many all.q jobs
>> because of the previous setting. Okay, that makes sense. Or maybe it 
>> doesn't
>> as seq_no seems to decide which queue to use, not which queue 
>> instance. But
>> how can I fill one queue from one side and the other queue from the 
>> other
>> side? Do I set a different seq_no for each queue instance on each 
>> compute
>> node? I'm not sure if it really matters much, as the cluster will 
>> probably
>> be fully or nearly fully utilized before we attempt to suspend any jobs.
>> Some regular priority jobs will need to be suspended when high 
>> priority jobs
>> get scheduled. What matters is that the high priority jobs can run
>> immediately, and it doesn't really matter if all the regular priority 
>> jobs
>> need to be suspended for that to happen.
>>
>> 4. We used a priority of -20 on a high-priority queue in Dan's SGE 
>> workshop
>> at the last day of the OSGC conference. I guess that priority was just a
>> leftover of the misguided attempt to use process priorities to 
>> attempt to
>> set grid engine policy. I also see that the priority does not affect 
>> what
>> SGE does, and that SGE sets the priorities of processes itself.
>>
>> I think I'm getting hopelessly confused with setting up different 
>> queues.
>> I'll have to come back and read up on them later. I guess I just 
>> don't quite
>> understand how jobs get assigned to queues (what's to prevent regular 
>> jobs
>> from going into the priority.q?) and the difference between the queue
>> sequence and the queue instances sequence (how are jobs assigned to
>> different queues vs. how jobs are assigned to slots on compute 
>> nodes). It's
>> just becoming a big blur to me. I'll punt on the whole queues thing 
>> for now.
>>
>> Thanks,
>> Steve
>>
>>
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de] Sent: Friday, May 23, 
>> 2008 7:26 AM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Questions on functional policy, share tree 
>> policy,
>> and priority queue
>>
>> Am 21.05.2008 um 02:29 schrieb Steve Chapel:
>>
>>  
>>> Hi,
>>>
>>> I'm new to Sun Grid Engine. I have it set up on a small Rocks cluster
>>> running SGE 6.0 for the three owners of the company I work for, but  
>>> want a
>>> more fair scheduling policy than the default FIFO policy. I want to  
>>> ensure
>>> that all three owners get approximately equal access to the  
>>> cluster. All
>>> jobs are single batch jobs or array batch jobs, in an embarrassingly
>>> parallel situation. Each of the compute nodes has two quad-core  
>>> processors.
>>>
>>> In particular, I want to ensure the following:
>>> 1. If there is no contention for CPUs, a user should get access to  
>>> as many
>>> CPUs as they need (equal to either the number of jobs they have run  
>>> or the
>>> number of CPUs, whichever is less).
>>> 2. If two or more users need to compete for CPUs, they should each  
>>> get a
>>> "fair share" of CPU time.
>>> 3. In case of urgent need, we should be able to run jobs that will  
>>> preempt
>>> any currently running jobs.
>>>
>>> It seems to me that I want to set up either a functional policy or  
>>> a share
>>> tree policy. In addition, I want a queue that supersedes the  
>>> default all.q.
>>> I have read through the documentation, and it looks like this is  
>>> what I
>>> should set up:
>>> * Add high-priority queue: qconf -aq priority.q
>>> * Add all.q to priority.q subordinate list: qconf -mattr queue
>>> subordinate_list all.q priority.q
>>> * Set priority of priority.q to highest: qconf -mattr queue  
>>> priority -20
>>> priority.q
>>> * In conf, set auto_user_fshare to 100
>>> * Check that fshare is 100 for all users listed in qconf -suserl
>>> * In sconf, set halftime to 48, compensation_factor to 5
>>> * Add stree: id=0, name=default, type=0, shares=100, childnodes=NONE
>>> * For functional policy: in sconf, set weight_tickets_functional to  
>>> 10000,
>>> weight_tickets_share to 0
>>> * For share tree policy: in sconf, set weight_tickets_functional to 0,
>>> weight_tickets_share to 10000
>>>
>>> Some specific questions I have on these settings are:
>>> 1. Is there a possibility that a user may not be able to use all  
>>> available
>>> CPUs on the cluster if there are enough jobs to use all CPUs?
>>>     
>>
>> You will need to setup RQS to define it per user:
>>
>> http://gridengine.sunsource.net/nonav/source/browse/~checkout~/ 
>> gridengine/doc/devel/rfe/ResourceQuotaSpecification.html
>>
>>  
>>> 2. Will the functional and share tree policy settings not interfere  
>>> with
>>> each other as long as either weight_tickets_functional or
>>> weight_tickets_share are 0?
>>>     
>>
>> You could even set in the scheduler config, to use only one of the  
>> two in "policy_hierarchy" and leave one out.
>>
>>  
>>> 3. Will submitting jobs to priority.q always immediately cause at  
>>> least some
>>> currently running jobs to be suspended? Will it cause all currently  
>>> running
>>>     
>>
>> Well, in your setup the all.q instance on a node will be suspended  
>> when *all* slots of priority.q are filled on a particular machine.  
>> You could setup:
>>
>> all.q=1
>>
>> and as soon as one slot is used in priority.q the complete queue  
>> instance on this machine is suspended. Possibly blocking 7 other  
>> jobs. It's not implemented for now to suspend slots instead of  
>> complete queue instances. What I would suggest: fill the cluster 
>> from  the one side with all.q, and from the other with priority.q to  
>> minimize the effect by using sequence numbers and setting up the  
>> scheduler to sort queus by seqno instead of load.
>>
>>  
>>> jobs to be suspended if a user submits enough jobs?
>>> 4. Does setting the -20 priority of priority.q have an effect?
>>>     
>>
>> Don't do this! Nice values below zero are reserved for system tasks!  
>> For user jobs only use 0 (high) to 19 (low).
>>
>> -- Reuti
>>
>>
>>  
>>> 5. Are there any settings I have missed?
>>>     
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>   
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list