[GE users] Queue subordination and custom complexes

Reuti reuti at staff.uni-marburg.de
Tue Apr 1 08:36:19 BST 2008


Am 01.04.2008 um 00:11 schrieb David Olbersen:
> Reuti,
>
>> What you can do: attach the resource to the queues, not to the host.
>> Hence every queue supplies the specified amount per node on its own.
>
> I think you're missing the idea. My "cores" complex is the same as the
> "num_procs" except a DOUBLE instead of an INT. Specifying it on a
> per-queue basis isn't appropriate since I'm trying to over- 
> subscribe my
> hosts. Also, my hosts have varying numbers of cores (2 or 4).

It is appropriate, as it is the limit per queue instance in a queue  
definition:

slots                 2,[@p3-1100=1],[node10=1],[node02=1],[node03=1], 
[node09=1]

But the term "over-subscribe" usually means to have more jobs running  
at the same time than cores are in the machine. But it seems you want  
to avoid over-subscription.

Therefore you can also set "slots" in each exec hosts configuration  
and both limits will apply per node (or even use an RQS for it). It  
just fills the node form different queues and avoids  
oversubscription. But if you want to use subordination (as you stated  
in your first post), you mustn't specify it on a per node basis at  
all. Just set "subordinate_list other.q=1" and other.q will get  
suspended as soon as one slot is used in the current queue.

But I don't get the clue, why you want to have a DOUBLE for it.

-- Reuti


> To elaborate: we want to give each job a whole CPU to play with. On a
> 4-processor machine that means only 4 jobs can run.
>
> However, to get the most utilization out of a machine, we may allow  
> many
> queues to run on it, to the point of having 8-12 slots total. However,
> if all 8 or 12 slots were full on the one machine, we'd have more
> jobs/CPU than we really want, causing all the jobs to slow down.
>
> To accommodate this situation, each job requires 1 "cores"  
> consumable by
> default. This makes it such that any mixture of jobs from various  
> queues
> can run on the machine, so long as there are still "cores"  
> available. It
> also means that if a job is multi-threaded and needs all 4 cores,  
> it can
> request as much and consume an entire machine.
>
> For example: node-a has 4 CPUs and is in q1, q2, and q3. q1, q2,  
> and q3
> are set to put 4 slots on each machine they're on. This means that
> node-a has 12 slots, but only 4 cpus. I set its "cores" complex =  
> 4. Now
> any combination of 4 jobs from queues q1, q2, and q3 can run. This  
> gets
> the most utilization out of the machine.
>
> So given that this resource has to remain at the node-level, are there
> any ways to get around this? Maybe give the resource back when the job
> gets suspended, then take it back when it gets resumed?
>
> -- 
> David Olbersen
>
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Monday, March 31, 2008 10:37 AM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Queue subordination and custom complexes
>
> Hi,
>
> Am 31.03.2008 um 18:46 schrieb David Olbersen:
>> I have the following configuration in my lab cluster:
>>
>> Q1 runs on machines #1, #2, and #3.
>> Q2 runs on the same machines.
>> Q2 is configured to have Q1 as a subordinate.
>> All machines have 2GB of RAM.
>>
>> If I submit 3 jobs to Q1 and 3 to Q2, the expected results are
>> given: jobs start in Q1 (submitted first) then get suspended while
>> jobs in Q2 run.
>>
>> Awesome.
>>
>> Next I try specifying hard resource requirements by adding "-hard -
>> l mem_free=1.5G" to each job. This still ends up working out,
>> probably because the jobs don't actually consume 1.5G of memory.
>> The jobs are simple things that drive up CPU utilization by dd'ing
>> from /dev/urandom out to /dev/null.
>>
>> Next, to further replicate my production environment I add a custom
>> complex named "cores" that gets set on a per-host basis to the
>> number of CPUs the machine has. Please note that we're not using
>> "num_proc" because we want some jobs to use fractions of a CPU and
>> num_proc is an INT.
>>
>> So each job will take up 1 "core" and each job has 1 "core".
>> With this set up the jobs in Q1 run, and the jobs in Q2 wait. No
>> suspension happens at all. Is this because the host resource is
>> actually being consumed? Is there any way to get around this?
>
> yes, you can check the remaining amount of this complex with "qhost -
> F cores". Or also per job: qstat -j <jobid> when "schedd_job_info
> true" in the scheduler setup). Be aware, that only complete queues
> can be suspended, and not just some slots of them.
>
> What you can do: attach the resource to the queues, not to the host.
> Hence every queue supplies the specified amount per node on its own.
>
> (sidenote: to avoid requesting the resource all the time and
> specifying the correct queue in addition, you could also have two
> resources cores1 and cores2. attach cores1 to Q1 and likewise cores2.
> qsub -l cores2=1 will also get the Q2 queue).
>
> -- Reuti
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list