FW: Re: [GE users] Queue subordination and custom complexes

Roberta Gigon RGigon at slb.com
Wed Apr 2 14:59:30 BST 2008


You were right... I checked to see what complexes were attached to that host and slots was set to two.  When I removed that, the subordination started working.  However... there is still a problem.

I submit a job requesting both processors... it runs.

[root at bear ~]$ qsub -q webmi_low.q at bear1.cl.slb.com -pe whole_node 2 /opt/sge/examples/jobs/simple2.sh
Your job 3998 ("simple2.sh") has been submitted
[root at bear ~]$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
   3998 0.55500 simple2.sh root         r     04/02/2008 09:22:00 webmi_low.q at bear1.cl.slb.com       2

I submit another job in the same queue requesting both processors... it goes into qw.  Good.

[root at bear ~]$ qsub -q webmi_low.q at bear1.cl.slb.com -pe whole_node 2 /opt/sge/examples/jobs/simple2.sh
Your job 3999 ("simple2.sh") has been submitted
[root at bear ~]$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
   3998 0.55500 simple2.sh root         r     04/02/2008 09:22:00 webmi_low.q at bear1.cl.slb.com       2
   3999 0.00000 simple2.sh root         qw    04/02/2008 09:22:07                                    2

I now submit a job which should suspend 3998... it does... Again, good.

[root at bear ~]$ qsub -q nuclear_hi.q at bear1.cl.slb.com /opt/sge/examples/jobs/simple2.sh
Your job 4000 ("simple2.sh") has been submitted
[root at bear ~]$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
   4000 0.50500 simple2.sh root         r     04/02/2008 09:22:20 nuclear_hi.q at bear1.cl.slb.com      1
   3998 0.60500 simple2.sh root         S     04/02/2008 09:22:00 webmi_low.q at bear1.cl.slb.com       2
   3999 0.60500 simple2.sh root         qw    04/02/2008 09:22:07                                    2


I submit 2 jobs to another queue on this host.  They both run, thereby oversubscribing the system.

[root at bear ~]$ qsub -q nuclear_low.q at bear1.cl.slb.com /opt/sge/examples/jobs/simple2.sh
Your job 4001 ("simple2.sh") has been submitted
[root at bear ~]$ qsub -q nuclear_low.q at bear1.cl.slb.com /opt/sge/examples/jobs/simple2.sh
Your job 4002 ("simple2.sh") has been submitted
[root at bear ~]$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
   4000 0.50500 simple2.sh root         r     04/02/2008 09:22:20 nuclear_hi.q at bear1.cl.slb.com      1
   3998 0.60500 simple2.sh root         S     04/02/2008 09:22:00 webmi_low.q at bear1.cl.slb.com       2
   4001 0.50500 simple2.sh root         r     04/02/2008 09:22:55 nuclear_low.q at bear1.cl.slb.com     1
   4002 0.50500 simple2.sh root         r     04/02/2008 09:23:00 nuclear_low.q at bear1.cl.slb.com     1
   3999 0.60500 simple2.sh root         qw    04/02/2008 09:22:07                                    2


So, if I have slots set to 2, the subordinate queues won't work.  If I don't have them set, my node can be oversubscribed.  How do I prevent the oversubscription, yet keep queue subordination working??

Thanks,
Roberta

---------------------------------------------------------------------------------------------
Roberta M. Gigon
Schlumberger-Doll Research
One Hampshire Street, MD-B253
Cambridge, MA 02139
617.768.2099 - phone
617.768.2381 - fax

This message is considered Schlumberger CONFIDENTIAL.  Please treat the information contained herein accordingly.


-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de]
Sent: Tuesday, April 01, 2008 6:05 PM
To: Roberta Gigon
Subject: PM: Re: [GE users] Queue subordination and custom complexes

Am 01.04.2008 um 23:22 schrieb Roberta Gigon:
> The reason I get is:
> (-l NONE) cannot run at host "bear1.cl.slb.com" because it offers
> only hc:slots=0.000000
>
> Thanks!

You're welcome. But is it now working?!? There was a host complex
which limited the slots, but this should have also been active when
you submit two single slot jobs. I'm confused.

-- Reuti


> Roberta
>
> ----------------------------------------------------------------------
> -----------------------
> Roberta M. Gigon
> Schlumberger-Doll Research
> One Hampshire Street, MD-B253
> Cambridge, MA 02139
> 617.768.2099 - phone
> 617.768.2381 - fax
>
> This message is considered Schlumberger CONFIDENTIAL.  Please treat
> the information contained herein accordingly.
>
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Tuesday, April 01, 2008 5:08 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Queue subordination and custom complexes
>
> Am 01.04.2008 um 22:45 schrieb Roberta Gigon:
>> Hi,
>>
>> Perhaps this will help clarify:
>>
>> I submit a job to webmi_low.q requesting both slots using the
>> whole_node pe.  It runs.
>>
>> [root at bear ~]$ qsub -q webmi_low.q at bear1.cl.slb.com -pe whole_node
>> 2 /opt/sge/examples/jobs/simple2.sh
>> Your job 3976 ("simple2.sh") has been submitted
>> [root at bear ~]$ qstat
>> job-ID  prior   name       user         state submit/start at
>> queue                          slots ja-task-ID
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------
>>    3976 0.55500 simple2.sh root         r     04/01/2008 16:31:10
>> webmi_low.q at bear1.cl.slb.com       2
>>
>> Then, I submit another job requesting both slots.  It goes into qw
>> mode as expected.  This is good!
>>
>> [root at bear ~]$ qsub -q webmi_low.q at bear1.cl.slb.com -pe whole_node
>> 2 /opt/sge/examples/jobs/simple2.sh
>> Your job 3977 ("simple2.sh") has been submitted
>> [root at bear ~]$ qstat
>> job-ID  prior   name       user         state submit/start at
>> queue                          slots ja-task-ID
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------
>>    3976 0.55500 simple2.sh root         r     04/01/2008 16:31:10
>> webmi_low.q at bear1.cl.slb.com       2
>>    3977 0.00000 simple2.sh root         qw    04/01/2008
>> 16:31:21                                    2
>>
>> Here is where things go badly:  I submit a job into nuclear_hi.q
>> which is supposed to suspend jobs in webmi_low.q.  Instead, it goes
>> into "qw".
>>
>> [root at bear ~]$ qsub -q nuclear_hi.q at bear1.cl.slb.com /opt/sge/
>> examples/jobs/simple2.sh
>> Your job 3979 ("simple2.sh") has been submitted
>> [root at bear ~]$ qstat
>> job-ID  prior   name       user         state submit/start at
>> queue                          slots ja-task-ID
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------
>>    3976 0.60500 simple2.sh root         r     04/01/2008 16:31:10
>> webmi_low.q at bear1.cl.slb.com       2
>>    3977 0.60500 simple2.sh root         qw    04/01/2008
>> 16:31:21                                    2
>>    3979 0.50500 simple2.sh root         qw    04/01/2008
>> 16:37:43                                    1
>
> Aha, so it never starts at all. Can you check with "qstat -j 3979"
> for the reason?
>
> -- Reuti
>
>
>>
>> However...
>> If I submit two jobs without the -pe flag:
>>
>> [root at bear ~]$ qsub -q webmi_low.q at bear1.cl.slb.com /opt/sge/
>> examples/jobs/simple2.sh
>> Your job 3980 ("simple2.sh") has been submitted
>> [root at bear ~]$ qsub -q webmi_low.q at bear1.cl.slb.com /opt/sge/
>> examples/jobs/simple2.sh
>> Your job 3981 ("simple2.sh") has been submitted
>> [root at bear ~]$ qstat
>> job-ID  prior   name       user         state submit/start at
>> queue                          slots ja-task-ID
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------
>>    3980 0.55500 simple2.sh root         r     04/01/2008 16:40:55
>> webmi_low.q at bear1.cl.slb.com       1
>>    3981 0.55500 simple2.sh root         r     04/01/2008 16:40:55
>> webmi_low.q at bear1.cl.slb.com       1
>>
>> And then submit a job into the other queue, the subordination works.
>>
>> [root at bear ~]$ qsub -q nuclear_hi.q at bear1.cl.slb.com /opt/sge/
>> examples/jobs/simple2.sh
>> Your job 3982 ("simple2.sh") has been submitted
>>
>> [root at bear ~]$ qstat
>> job-ID  prior   name       user         state submit/start at
>> queue                          slots ja-task-ID
>> ---------------------------------------------------------------------
>> -
>> -------------------------------------------
>>    3982 0.55500 simple2.sh root         r     04/01/2008 16:42:35
>> nuclear_hi.q at bear1.cl.slb.com      1
>>    3980 0.55500 simple2.sh root         S     04/01/2008 16:40:55
>> webmi_low.q at bear1.cl.slb.com       1
>>    3981 0.55500 simple2.sh root         S     04/01/2008 16:40:55
>> webmi_low.q at bear1.cl.slb.com       1
>>
>>
>> Any help you can provide is greatly appreciated!
>>
>>
>> Roberta M. Gigon
>> Schlumberger-Doll Research
>> One Hampshire Street, MD-B253
>> Cambridge, MA 02139
>> 617.768.2099 - phone
>> 617.768.2381 - fax
>>
>> This message is considered Schlumberger CONFIDENTIAL.  Please treat
>> the information contained herein accordingly.
>>
>>
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: Tuesday, April 01, 2008 4:04 PM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Queue subordination and custom complexes
>>
>> Am 01.04.2008 um 19:49 schrieb Roberta Gigon:
>>> I tried this and what I discovered is when I submit a job into
>>> queue1 with the -pe flag giving me exclusive use of both slots and
>>> then submit another job (with or without the -pe flag) into queue2,
>>> the job in queue1 never gets suspended.
>>>
>>> If, alternatively, I submit two independent jobs into queue1 and
>>> then submit a job into queue2, the job suspension works as expected.
>>
>> What Du you mean in detail: the state in qstat is not going to
>> suspend, or the PE application is not being suspended according to
>> top and/or ps?
>>
>> -- Reuti
>>
>>
>>> Any ideas what is going on here?
>>>
>>> Thanks,
>>> Roberta
>>>
>>> --------------------------------------------------------------------
>>> -
>>> -
>>> -----------------------
>>> Roberta M. Gigon
>>> Schlumberger-Doll Research
>>> One Hampshire Street, MD-B253
>>> Cambridge, MA 02139
>>> 617.768.2099 - phone
>>> 617.768.2381 - fax
>>>
>>> This message is considered Schlumberger CONFIDENTIAL.  Please treat
>>> the information contained herein accordingly.
>>>
>>> -----Original Message-----
>>> From: Reuti [mailto:reuti at Staff.Uni-Marburg.DE]
>>> Sent: Tuesday, April 01, 2008 5:56 AM
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] Queue subordination and custom complexes
>>>
>>> http://gridengine.sunsource.net/servlets/ReadMsg?
>>> list=users&msgNo=24049
>>>
>>> Am 31.03.2008 um 21:56 schrieb Roberta Gigon:
>>>> I have a similar situation and am running into difficulties.
>>>>
>>>> I have queue1 consisting of nodes with two processors.
>>>> I have queue2 consisting of the same nodes, but this queue is
>>>> subordinate to queue1.
>>>>
>>>> I have User A who wants both processors on a node or none at all
>>>> and submits into queue2.
>>>> I have User B who wants only one processor per job and submits into
>>>> queue1.
>>>>
>>>> So... I have User A submit into queue2 using a PE I set up
>>>> (whole_node).  His job runs and does indeed take up both slots in
>>>> that queue.  When User B submits into queue1, his job also runs.
>>>> However, the behavior we are looking for is User A's job should
>>>> suspend and User B's should run.
>>>>
>>>> Next, I tried this: I set up a consumable complex called bearprocs
>>>> and set it to 2 on each host.  Then I had User A submit into queue2
>>>> using -l bearprocs=2.  This worked fine and gave User A exclusive
>>>> use of both processors on the node.  However, now when User B
>>>> submits into queue1, the job remains pending and does not suspend
>>>> User A's job, presumably because the scheduler checks for the
>>>> availability of the consumable bearprocs before looking at
>>>> subordination.
>>>>
>>>> I see the suggestion below from Reuti to attach the complex to the
>>>> queue.  Will this solve my problem as well?  If so, do I need to
>>>> add it to both queue1 and queue2?  If so, how should User B submit
>>>> their job -- -l bearprocs=1?  No -l option?
>>>>
>>>> Thanks,
>>>> Roberta
>>>>
>>>>
>>>> -------------------------------------------------------------------
>>>> -
>>>> -
>>>> -
>>>> -----------------------
>>>> Roberta M. Gigon
>>>> Schlumberger-Doll Research
>>>> One Hampshire Street, MD-B253
>>>> Cambridge, MA 02139
>>>> 617.768.2099 - phone
>>>> 617.768.2381 - fax
>>>>
>>>> This message is considered Schlumberger CONFIDENTIAL.  Please treat
>>>> the information contained herein accordingly.
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>>> Sent: Monday, March 31, 2008 1:37 PM
>>>> To: users at gridengine.sunsource.net
>>>> Subject: Re: [GE users] Queue subordination and custom complexes
>>>>
>>>> Hi,
>>>>
>>>> Am 31.03.2008 um 18:46 schrieb David Olbersen:
>>>>> I have the following configuration in my lab cluster:
>>>>>
>>>>> Q1 runs on machines #1, #2, and #3.
>>>>> Q2 runs on the same machines.
>>>>> Q2 is configured to have Q1 as a subordinate.
>>>>> All machines have 2GB of RAM.
>>>>>
>>>>> If I submit 3 jobs to Q1 and 3 to Q2, the expected results are
>>>>> given: jobs start in Q1 (submitted first) then get suspended while
>>>>> jobs in Q2 run.
>>>>>
>>>>> Awesome.
>>>>>
>>>>> Next I try specifying hard resource requirements by adding "-
>>>>> hard -
>>>>> l mem_free=1.5G" to each job. This still ends up working out,
>>>>> probably because the jobs don't actually consume 1.5G of memory.
>>>>> The jobs are simple things that drive up CPU utilization by dd'ing
>>>>> from /dev/urandom out to /dev/null.
>>>>>
>>>>> Next, to further replicate my production environment I add a
>>>>> custom
>>>>> complex named "cores" that gets set on a per-host basis to the
>>>>> number of CPUs the machine has. Please note that we're not using
>>>>> "num_proc" because we want some jobs to use fractions of a CPU and
>>>>> num_proc is an INT.
>>>>>
>>>>> So each job will take up 1 "core" and each job has 1 "core".
>>>>> With this set up the jobs in Q1 run, and the jobs in Q2 wait. No
>>>>> suspension happens at all. Is this because the host resource is
>>>>> actually being consumed? Is there any way to get around this?
>>>>
>>>> yes, you can check the remaining amount of this complex with
>>>> "qhost -
>>>> F cores". Or also per job: qstat -j <jobid> when "schedd_job_info
>>>> true" in the scheduler setup). Be aware, that only complete queues
>>>> can be suspended, and not just some slots of them.
>>>>
>>>> What you can do: attach the resource to the queues, not to the
>>>> host.
>>>> Hence every queue supplies the specified amount per node on its
>>>> own.
>>>>
>>>> (sidenote: to avoid requesting the resource all the time and
>>>> specifying the correct queue in addition, you could also have two
>>>> resources cores1 and cores2. attach cores1 to Q1 and likewise
>>>> cores2.
>>>> qsub -l cores2=1 will also get the Q2 queue).
>>>>
>>>> -- Reuti
>>>> -------------------------------------------------------------------
>>>> -
>>>> -
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-
>>>> help at gridengine.sunsource.net
>>>>
>>>>
>>>> -------------------------------------------------------------------
>>>> -
>>>> -
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-
>>>> help at gridengine.sunsource.net
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list