[GE users] how to restrict max 2 jobs per user/user group

reuti reuti at staff.uni-marburg.de
Fri Nov 21 10:50:25 GMT 2008


Am 21.11.2008 um 11:34 schrieb Sangamesh B:

> On Mon, Nov 17, 2008 at 1:16 PM, reuti <reuti at staff.uni-marburg.de>  
> wrote:
>> Am 17.11.2008 um 05:28 schrieb Sangamesh B:
>>
>>>> <snip>
>>>> which even doesn't apply to your case.
>>> Thanks for your valuable suggestion.
>>>
>>> Referring, your post achieved restricted number of jobs to 2.
>>>
>>> 1). Created a complex with value 2 as follows:
>>> # qconf -sc
>>> #name               shortcut   type        relop requestable
>>> consumable default  urgency
>>> #------------------------------------------------------------------- 
>>> --
>>> -------------------
>>> external            ext        INT         <=    YES         YES
>>>  2        0
>>
>> The default request should be zero. Otherwise everyone will consume 2
>> per slot as a default.
>>
> Redefined to zero.
>>
>>> 2). In each external user's home directory created .sge_request with
>>> the following content:
>>>
>>> -l ext=1
>>
>> Ok.
>>
> same..
>>> A queue external.q with 8 hosts - total 32 slots.
>>>
> Now the scenario is different. There is no separate queue with 8
> hosts/32 slots. The default queue - with all hosts - is used.
>>> Added "ext" under complex_values entry, as follows.
>>>
>>> complex_values        external=2
>>
>> This would be per queue instance, i.e. every node. Instead it must be
>> defined in `qconf -me global` under complex values. And there you
>> have to give it 32 as the maximum slots. But as said: this doesn't
>> apply to your case, as the maxiumum number is limited by the defined
>> queue anyway. Exception: you decide to have only one queue. Then you
>> are done here and can skip the following suspend-setup.
>>
> Removed complex_values from queue.
> Added complex_values external=2 with "qconf -me global".
> In each external user's home directory, created .sge_request with "-l
> external=1".
>
> This is working for serial jobs only.

Correct.


> If 4 serial jobs are submitted
> simultaneously, 2 will be running and 2 will be in 'qw' state. That's
> correct.
>
> But when a parallel job with more than 2 cores submitted, the job will
> be in qw state always.
> I think, the "complex_values" refers to the slots, not number of jobs.

Correct. For parallel ones you will have to request 1/(number of  
requested slots), i.e. 0.25 for four requested slots as the request  
will be multiplied.

This will be enhanced in a future version of SGE.

> So is there a way to restrict number of jobs to 2, irrespective of
> serial/parallel job/slots.
>
> I've an idea to restricting a parallel job not to take more than 16  
> slots:
> Define 2 PE's with the names pe1 & pe2 with 16 slots each.
> In the parallel job script,
>
> #$ -pe pe* <slots>
>
> If one job is running with 16 slots, the next job will use the second
> pe. (I've tested it, it works)

Good solution for your case, until a more convenient setup is available.

-- Reuti


>
> This way number of slots can be controlled to <=16.
>
> If restricting the number of jobs to 2 is done, then everything is  
> solved.
>>
>>> Still this setup is not complete as the job may take more than 16
>>> slots.
>>
>> Correct. There is no way to limit it, but it's an RFE:
>>
>> http://gridengine.sunsource.net/issues/show_bug.cgi?id=2147
>>
>>> <snip>
>>>> Even it it would be possbile to set it up right now, I'm not sure
>>>> whether it's the best approach. Imagine the external users have two
>>>> jobs running and occuyping 8 slots in total. So you would like to
>>>> allow internal users to use the remaining 24 slots - unfortunately
>>>> these are really long jobs. After some while the two external jobs
>>>> end and now they want to run 2 jobs with 16 slots each - so  
>>>> using the
>>>> granted number of 32 slots. They have to wait. (If you grant  
>>>> them 32
>>>> slots, you could also state that it's up them, how to use these  
>>>> for a
>>>> mixture of serial and parallel jobs.)
>>>>
>>>> For a hard-coded setup of the nodes (which you did already as you
>>>> mentioned), you could also tell the internal user to run jobs on  
>>>> the
>>>> @internal_hgrp nodes, and if they decide to use @external_hgrp,  
>>>> they
>>>> can do so if slots are free, but their jobs might get suspended  
>>>> when
>>> Is that with the existing setup? How the jobs get suspended?
>>
>> -) You can simply define the "default.q" queue for internal users on
>> all machines.
>>
>> -) In the external queue define the "default.q" queue in
>> "subordinate_list default.q=1"
>>
>> -) To give your users a chance to select this queue instances: you
>> could define and attach a boolean complex "secondary" (BOOLEAN
>> FORCED) and attach it only to the queue instances on these external
>> machines in the default.q:
>>
>> complex_values NONE,[@external_hgrp=secondary=TRUE]
>>
>> => Normal jobs will run only on the internal nodes.
>>
>> => Internal users can request the external machines, with the
>> possible suspension of the jobs. So they should do it for small jobs:
>> qsub -l secondary ...
>>
>>
> I'll consider it as "Optimization" level and look into it later.
>>>> the external users decide to run a job there.
>>>>
>>>> Is this feasible?
>>> I think its very difficult to get the exact requirement.
>>
>> True. In 6.1 or 6.2 I would suggest to have only one queue and limit
>> the slots for the external users to 32 with an RQS and you are done.
>> Although they still can run more than 2 jobs.
>>
>> -- Reuti
>>
> Thanks for the suggestions.
>>
>>>>
>>>> -- Reuti
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>> dsForumId=38&dsMessageId=88805
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-
>>>> unsubscribe at gridengine.sunsource.net].
>>>>
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=88852
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=88855
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=89326
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89330

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list