[GE users] Queue subordination and custom complexes

Reuti reuti at staff.uni-marburg.de
Tue Apr 1 21:09:33 BST 2008


Am 01.04.2008 um 18:28 schrieb David Olbersen:
> Reuti,
>
> We want to use a DOUBLE because we consider some of our jobs to use  
> less
> than a whole CPU. We have some jobs that need to run that never do  
> very
> much CPU processing at all. For example, we have one type of job which
> we consider to use 1/4 of a CPU.
>
> The "smaller" jobs only request 1/4 of a CPU via "-l cores=0.25". The
> queue these jobs run in has it's slot count set to 16 (4 cores * 4  
> jobs
> per core = 16). However, these machines may also be used by queues  
> which
> use whole, or even multiple CPUs. So in this situation, what would  
> I set
> the slots attribute to on this machine? 1? 4? 16? It seems  
> impossible to
> set it correctly -- if I set it to 16 I can have an over-subscribed  
> (by
> your definition) machine. If I set it to 4 I can still have an
> over-subscribed machine if some multi-threaded jobs come along. If  
> I set
> it to 1 I'll end up wasting resources.

So, contrary to your first post, you don't want to use subordination  
any longer - where only one queue is active at a given point in time  
and the others are suspended?

-- Reuti


> -- 
> David Olbersen
>
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Tuesday, April 01, 2008 12:36 AM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Queue subordination and custom complexes
>
> Am 01.04.2008 um 00:11 schrieb David Olbersen:
>> Reuti,
>>
>>> What you can do: attach the resource to the queues, not to the host.
>>> Hence every queue supplies the specified amount per node on its own.
>>
>> I think you're missing the idea. My "cores" complex is the same as  
>> the
>
>> "num_procs" except a DOUBLE instead of an INT. Specifying it on a
>> per-queue basis isn't appropriate since I'm trying to over- subscribe
>> my hosts. Also, my hosts have varying numbers of cores (2 or 4).
>
> It is appropriate, as it is the limit per queue instance in a queue
> definition:
>
> slots                 2,[@p3-1100=1],[node10=1],[node02=1],[node03=1],
> [node09=1]
>
> But the term "over-subscribe" usually means to have more jobs  
> running at
> the same time than cores are in the machine. But it seems you want to
> avoid over-subscription.
>
> Therefore you can also set "slots" in each exec hosts configuration  
> and
> both limits will apply per node (or even use an RQS for it). It just
> fills the node form different queues and avoids oversubscription.  
> But if
> you want to use subordination (as you stated in your first post), you
> mustn't specify it on a per node basis at all. Just set
> "subordinate_list other.q=1" and other.q will get suspended as soon as
> one slot is used in the current queue.
>
> But I don't get the clue, why you want to have a DOUBLE for it.
>
> -- Reuti
>
>
>> To elaborate: we want to give each job a whole CPU to play with. On a
>> 4-processor machine that means only 4 jobs can run.
>>
>> However, to get the most utilization out of a machine, we may allow
>> many queues to run on it, to the point of having 8-12 slots total.
>> However, if all 8 or 12 slots were full on the one machine, we'd have
>> more jobs/CPU than we really want, causing all the jobs to slow down.
>>
>> To accommodate this situation, each job requires 1 "cores"
>> consumable by
>> default. This makes it such that any mixture of jobs from various
>> queues can run on the machine, so long as there are still "cores"
>> available. It
>> also means that if a job is multi-threaded and needs all 4 cores, it
>> can request as much and consume an entire machine.
>>
>> For example: node-a has 4 CPUs and is in q1, q2, and q3. q1, q2, and
>> q3 are set to put 4 slots on each machine they're on. This means that
>> node-a has 12 slots, but only 4 cpus. I set its "cores" complex = 4.
>> Now any combination of 4 jobs from queues q1, q2, and q3 can run.  
>> This
>
>> gets the most utilization out of the machine.
>>
>> So given that this resource has to remain at the node-level, are  
>> there
>
>> any ways to get around this? Maybe give the resource back when the  
>> job
>
>> gets suspended, then take it back when it gets resumed?
>>
>> --
>> David Olbersen
>>
>>
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: Monday, March 31, 2008 10:37 AM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Queue subordination and custom complexes
>>
>> Hi,
>>
>> Am 31.03.2008 um 18:46 schrieb David Olbersen:
>>> I have the following configuration in my lab cluster:
>>>
>>> Q1 runs on machines #1, #2, and #3.
>>> Q2 runs on the same machines.
>>> Q2 is configured to have Q1 as a subordinate.
>>> All machines have 2GB of RAM.
>>>
>>> If I submit 3 jobs to Q1 and 3 to Q2, the expected results are
>>> given: jobs start in Q1 (submitted first) then get suspended while
>>> jobs in Q2 run.
>>>
>>> Awesome.
>>>
>>> Next I try specifying hard resource requirements by adding "-hard  
>>> - l
>
>>> mem_free=1.5G" to each job. This still ends up working out, probably
>>> because the jobs don't actually consume 1.5G of memory.
>>> The jobs are simple things that drive up CPU utilization by dd'ing
>>> from /dev/urandom out to /dev/null.
>>>
>>> Next, to further replicate my production environment I add a custom
>>> complex named "cores" that gets set on a per-host basis to the  
>>> number
>
>>> of CPUs the machine has. Please note that we're not using "num_proc"
>>> because we want some jobs to use fractions of a CPU and num_proc is
>>> an INT.
>>>
>>> So each job will take up 1 "core" and each job has 1 "core".
>>> With this set up the jobs in Q1 run, and the jobs in Q2 wait. No
>>> suspension happens at all. Is this because the host resource is
>>> actually being consumed? Is there any way to get around this?
>>
>> yes, you can check the remaining amount of this complex with "qhost -
>> F cores". Or also per job: qstat -j <jobid> when "schedd_job_info
>> true" in the scheduler setup). Be aware, that only complete queues  
>> can
>
>> be suspended, and not just some slots of them.
>>
>> What you can do: attach the resource to the queues, not to the host.
>> Hence every queue supplies the specified amount per node on its own.
>>
>> (sidenote: to avoid requesting the resource all the time and
>> specifying the correct queue in addition, you could also have two
>> resources cores1 and cores2. attach cores1 to Q1 and likewise cores2.
>> qsub -l cores2=1 will also get the Q2 queue).
>>
>> -- Reuti
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list