[GE users] Queue subordination and custom complexes

Roberta Gigon RGigon at slb.com
Tue Apr 1 18:49:30 BST 2008


I tried this and what I discovered is when I submit a job into queue1 with the -pe flag giving me exclusive use of both slots and then submit another job (with or without the -pe flag) into queue2, the job in queue1 never gets suspended.

If, alternatively, I submit two independent jobs into queue1 and then submit a job into queue2, the job suspension works as expected.

Any ideas what is going on here?

Thanks,
Roberta

---------------------------------------------------------------------------------------------
Roberta M. Gigon
Schlumberger-Doll Research
One Hampshire Street, MD-B253
Cambridge, MA 02139
617.768.2099 - phone
617.768.2381 - fax

This message is considered Schlumberger CONFIDENTIAL.  Please treat the information contained herein accordingly.

-----Original Message-----
From: Reuti [mailto:reuti at Staff.Uni-Marburg.DE]
Sent: Tuesday, April 01, 2008 5:56 AM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Queue subordination and custom complexes

http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=24049

Am 31.03.2008 um 21:56 schrieb Roberta Gigon:
> I have a similar situation and am running into difficulties.
>
> I have queue1 consisting of nodes with two processors.
> I have queue2 consisting of the same nodes, but this queue is
> subordinate to queue1.
>
> I have User A who wants both processors on a node or none at all
> and submits into queue2.
> I have User B who wants only one processor per job and submits into
> queue1.
>
> So... I have User A submit into queue2 using a PE I set up
> (whole_node).  His job runs and does indeed take up both slots in
> that queue.  When User B submits into queue1, his job also runs.
> However, the behavior we are looking for is User A's job should
> suspend and User B's should run.
>
> Next, I tried this: I set up a consumable complex called bearprocs
> and set it to 2 on each host.  Then I had User A submit into queue2
> using -l bearprocs=2.  This worked fine and gave User A exclusive
> use of both processors on the node.  However, now when User B
> submits into queue1, the job remains pending and does not suspend
> User A's job, presumably because the scheduler checks for the
> availability of the consumable bearprocs before looking at
> subordination.
>
> I see the suggestion below from Reuti to attach the complex to the
> queue.  Will this solve my problem as well?  If so, do I need to
> add it to both queue1 and queue2?  If so, how should User B submit
> their job -- -l bearprocs=1?  No -l option?
>
> Thanks,
> Roberta
>
>
> ----------------------------------------------------------------------
> -----------------------
> Roberta M. Gigon
> Schlumberger-Doll Research
> One Hampshire Street, MD-B253
> Cambridge, MA 02139
> 617.768.2099 - phone
> 617.768.2381 - fax
>
> This message is considered Schlumberger CONFIDENTIAL.  Please treat
> the information contained herein accordingly.
>
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Monday, March 31, 2008 1:37 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Queue subordination and custom complexes
>
> Hi,
>
> Am 31.03.2008 um 18:46 schrieb David Olbersen:
>> I have the following configuration in my lab cluster:
>>
>> Q1 runs on machines #1, #2, and #3.
>> Q2 runs on the same machines.
>> Q2 is configured to have Q1 as a subordinate.
>> All machines have 2GB of RAM.
>>
>> If I submit 3 jobs to Q1 and 3 to Q2, the expected results are
>> given: jobs start in Q1 (submitted first) then get suspended while
>> jobs in Q2 run.
>>
>> Awesome.
>>
>> Next I try specifying hard resource requirements by adding "-hard -
>> l mem_free=1.5G" to each job. This still ends up working out,
>> probably because the jobs don't actually consume 1.5G of memory.
>> The jobs are simple things that drive up CPU utilization by dd'ing
>> from /dev/urandom out to /dev/null.
>>
>> Next, to further replicate my production environment I add a custom
>> complex named "cores" that gets set on a per-host basis to the
>> number of CPUs the machine has. Please note that we're not using
>> "num_proc" because we want some jobs to use fractions of a CPU and
>> num_proc is an INT.
>>
>> So each job will take up 1 "core" and each job has 1 "core".
>> With this set up the jobs in Q1 run, and the jobs in Q2 wait. No
>> suspension happens at all. Is this because the host resource is
>> actually being consumed? Is there any way to get around this?
>
> yes, you can check the remaining amount of this complex with "qhost -
> F cores". Or also per job: qstat -j <jobid> when "schedd_job_info
> true" in the scheduler setup). Be aware, that only complete queues
> can be suspended, and not just some slots of them.
>
> What you can do: attach the resource to the queues, not to the host.
> Hence every queue supplies the specified amount per node on its own.
>
> (sidenote: to avoid requesting the resource all the time and
> specifying the correct queue in addition, you could also have two
> resources cores1 and cores2. attach cores1 to Q1 and likewise cores2.
> qsub -l cores2=1 will also get the Q2 queue).
>
> -- Reuti
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list