[GE users] Queue subordination and custom complexes

David Olbersen dolbersen at nextwave.com
Tue Apr 1 17:28:22 BST 2008


Reuti, 

We want to use a DOUBLE because we consider some of our jobs to use less
than a whole CPU. We have some jobs that need to run that never do very
much CPU processing at all. For example, we have one type of job which
we consider to use 1/4 of a CPU.

The "smaller" jobs only request 1/4 of a CPU via "-l cores=0.25". The
queue these jobs run in has it's slot count set to 16 (4 cores * 4 jobs
per core = 16). However, these machines may also be used by queues which
use whole, or even multiple CPUs. So in this situation, what would I set
the slots attribute to on this machine? 1? 4? 16? It seems impossible to
set it correctly -- if I set it to 16 I can have an over-subscribed (by
your definition) machine. If I set it to 4 I can still have an
over-subscribed machine if some multi-threaded jobs come along. If I set
it to 1 I'll end up wasting resources.

-- 
David Olbersen
 

-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Tuesday, April 01, 2008 12:36 AM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Queue subordination and custom complexes

Am 01.04.2008 um 00:11 schrieb David Olbersen:
> Reuti,
>
>> What you can do: attach the resource to the queues, not to the host.
>> Hence every queue supplies the specified amount per node on its own.
>
> I think you're missing the idea. My "cores" complex is the same as the

> "num_procs" except a DOUBLE instead of an INT. Specifying it on a 
> per-queue basis isn't appropriate since I'm trying to over- subscribe 
> my hosts. Also, my hosts have varying numbers of cores (2 or 4).

It is appropriate, as it is the limit per queue instance in a queue
definition:

slots                 2,[@p3-1100=1],[node10=1],[node02=1],[node03=1], 
[node09=1]

But the term "over-subscribe" usually means to have more jobs running at
the same time than cores are in the machine. But it seems you want to
avoid over-subscription.

Therefore you can also set "slots" in each exec hosts configuration and
both limits will apply per node (or even use an RQS for it). It just
fills the node form different queues and avoids oversubscription. But if
you want to use subordination (as you stated in your first post), you
mustn't specify it on a per node basis at all. Just set
"subordinate_list other.q=1" and other.q will get suspended as soon as
one slot is used in the current queue.

But I don't get the clue, why you want to have a DOUBLE for it.

-- Reuti


> To elaborate: we want to give each job a whole CPU to play with. On a 
> 4-processor machine that means only 4 jobs can run.
>
> However, to get the most utilization out of a machine, we may allow 
> many queues to run on it, to the point of having 8-12 slots total. 
> However, if all 8 or 12 slots were full on the one machine, we'd have 
> more jobs/CPU than we really want, causing all the jobs to slow down.
>
> To accommodate this situation, each job requires 1 "cores"  
> consumable by
> default. This makes it such that any mixture of jobs from various 
> queues can run on the machine, so long as there are still "cores"
> available. It
> also means that if a job is multi-threaded and needs all 4 cores, it 
> can request as much and consume an entire machine.
>
> For example: node-a has 4 CPUs and is in q1, q2, and q3. q1, q2, and 
> q3 are set to put 4 slots on each machine they're on. This means that 
> node-a has 12 slots, but only 4 cpus. I set its "cores" complex = 4. 
> Now any combination of 4 jobs from queues q1, q2, and q3 can run. This

> gets the most utilization out of the machine.
>
> So given that this resource has to remain at the node-level, are there

> any ways to get around this? Maybe give the resource back when the job

> gets suspended, then take it back when it gets resumed?
>
> --
> David Olbersen
>
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Monday, March 31, 2008 10:37 AM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Queue subordination and custom complexes
>
> Hi,
>
> Am 31.03.2008 um 18:46 schrieb David Olbersen:
>> I have the following configuration in my lab cluster:
>>
>> Q1 runs on machines #1, #2, and #3.
>> Q2 runs on the same machines.
>> Q2 is configured to have Q1 as a subordinate.
>> All machines have 2GB of RAM.
>>
>> If I submit 3 jobs to Q1 and 3 to Q2, the expected results are
>> given: jobs start in Q1 (submitted first) then get suspended while 
>> jobs in Q2 run.
>>
>> Awesome.
>>
>> Next I try specifying hard resource requirements by adding "-hard - l

>> mem_free=1.5G" to each job. This still ends up working out, probably 
>> because the jobs don't actually consume 1.5G of memory.
>> The jobs are simple things that drive up CPU utilization by dd'ing 
>> from /dev/urandom out to /dev/null.
>>
>> Next, to further replicate my production environment I add a custom 
>> complex named "cores" that gets set on a per-host basis to the number

>> of CPUs the machine has. Please note that we're not using "num_proc" 
>> because we want some jobs to use fractions of a CPU and num_proc is 
>> an INT.
>>
>> So each job will take up 1 "core" and each job has 1 "core".
>> With this set up the jobs in Q1 run, and the jobs in Q2 wait. No 
>> suspension happens at all. Is this because the host resource is 
>> actually being consumed? Is there any way to get around this?
>
> yes, you can check the remaining amount of this complex with "qhost - 
> F cores". Or also per job: qstat -j <jobid> when "schedd_job_info 
> true" in the scheduler setup). Be aware, that only complete queues can

> be suspended, and not just some slots of them.
>
> What you can do: attach the resource to the queues, not to the host.
> Hence every queue supplies the specified amount per node on its own.
>
> (sidenote: to avoid requesting the resource all the time and 
> specifying the correct queue in addition, you could also have two 
> resources cores1 and cores2. attach cores1 to Q1 and likewise cores2.
> qsub -l cores2=1 will also get the Q2 queue).
>
> -- Reuti
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list