[GE users] how to restrict max 2 jobs per user/user group

Sangamesh B forum.san at gmail.com
Fri Nov 21 10:34:50 GMT 2008


On Mon, Nov 17, 2008 at 1:16 PM, reuti <reuti at staff.uni-marburg.de> wrote:
> Am 17.11.2008 um 05:28 schrieb Sangamesh B:
>
>>> <snip>
>>> which even doesn't apply to your case.
>> Thanks for your valuable suggestion.
>>
>> Referring, your post achieved restricted number of jobs to 2.
>>
>> 1). Created a complex with value 2 as follows:
>> # qconf -sc
>> #name               shortcut   type        relop requestable
>> consumable default  urgency
>> #---------------------------------------------------------------------
>> -------------------
>> external            ext        INT         <=    YES         YES
>>  2        0
>
> The default request should be zero. Otherwise everyone will consume 2
> per slot as a default.
>
Redefined to zero.
>
>> 2). In each external user's home directory created .sge_request with
>> the following content:
>>
>> -l ext=1
>
> Ok.
>
same..
>> A queue external.q with 8 hosts - total 32 slots.
>>
Now the scenario is different. There is no separate queue with 8
hosts/32 slots. The default queue - with all hosts - is used.
>> Added "ext" under complex_values entry, as follows.
>>
>> complex_values        external=2
>
> This would be per queue instance, i.e. every node. Instead it must be
> defined in `qconf -me global` under complex values. And there you
> have to give it 32 as the maximum slots. But as said: this doesn't
> apply to your case, as the maxiumum number is limited by the defined
> queue anyway. Exception: you decide to have only one queue. Then you
> are done here and can skip the following suspend-setup.
>
Removed complex_values from queue.
Added complex_values external=2 with "qconf -me global".
In each external user's home directory, created .sge_request with "-l
external=1".

This is working for serial jobs only. If 4 serial jobs are submitted
simultaneously, 2 will be running and 2 will be in 'qw' state. That's
correct.

But when a parallel job with more than 2 cores submitted, the job will
be in qw state always.
I think, the "complex_values" refers to the slots, not number of jobs.

So is there a way to restrict number of jobs to 2, irrespective of
serial/parallel job/slots.

I've an idea to restricting a parallel job not to take more than 16 slots:
Define 2 PE's with the names pe1 & pe2 with 16 slots each.
In the parallel job script,

#$ -pe pe* <slots>

If one job is running with 16 slots, the next job will use the second
pe. (I've tested it, it works)

This way number of slots can be controlled to <=16.

If restricting the number of jobs to 2 is done, then everything is solved.
>
>> Still this setup is not complete as the job may take more than 16
>> slots.
>
> Correct. There is no way to limit it, but it's an RFE:
>
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=2147
>
>> <snip>
>>> Even it it would be possbile to set it up right now, I'm not sure
>>> whether it's the best approach. Imagine the external users have two
>>> jobs running and occuyping 8 slots in total. So you would like to
>>> allow internal users to use the remaining 24 slots - unfortunately
>>> these are really long jobs. After some while the two external jobs
>>> end and now they want to run 2 jobs with 16 slots each - so using the
>>> granted number of 32 slots. They have to wait. (If you grant them 32
>>> slots, you could also state that it's up them, how to use these for a
>>> mixture of serial and parallel jobs.)
>>>
>>> For a hard-coded setup of the nodes (which you did already as you
>>> mentioned), you could also tell the internal user to run jobs on the
>>> @internal_hgrp nodes, and if they decide to use @external_hgrp, they
>>> can do so if slots are free, but their jobs might get suspended when
>> Is that with the existing setup? How the jobs get suspended?
>
> -) You can simply define the "default.q" queue for internal users on
> all machines.
>
> -) In the external queue define the "default.q" queue in
> "subordinate_list default.q=1"
>
> -) To give your users a chance to select this queue instances: you
> could define and attach a boolean complex "secondary" (BOOLEAN
> FORCED) and attach it only to the queue instances on these external
> machines in the default.q:
>
> complex_values NONE,[@external_hgrp=secondary=TRUE]
>
> => Normal jobs will run only on the internal nodes.
>
> => Internal users can request the external machines, with the
> possible suspension of the jobs. So they should do it for small jobs:
> qsub -l secondary ...
>
>
I'll consider it as "Optimization" level and look into it later.
>>> the external users decide to run a job there.
>>>
>>> Is this feasible?
>> I think its very difficult to get the exact requirement.
>
> True. In 6.1 or 6.2 I would suggest to have only one queue and limit
> the slots for the external users to 32 with an RQS and you are done.
> Although they still can run more than 2 jobs.
>
> -- Reuti
>
Thanks for the suggestions.
>
>>>
>>> -- Reuti
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=88805
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=88852
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88855
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89326

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list