[GE users] resource quota question

rumpelkeks tina.friedrich at diamond.ac.uk
Wed May 19 09:37:25 BST 2010


Hi,

> Am 18.05.2010 um 18:48 schrieb rumpelkeks:
> 
>>>> What I'm trying to achieve is restrict resources for a group of users 
>>>> (defined as an access list) so they can only ever use half of the nodes 
>>>> (not the slots per node) of any given host group in my cluster.
>>>>
>>>> So they could run, for example, pe smp jobs using all slots on a host 
>>>> (and submit as many of them as they like) but their jobs would then only 
>>>> be queued on half the nodes - but this should be per any given 
>>>> hostgroup, not across the whole cluster (if that description makes sense).
>>>>
>>>> Is that possible (as a simple resource quota), and what would the syntax 
>>>> be? I tried
>>>>
>>>> {
>>>>   name         half_of_com04
>>>>   description  "resource quota to restrict external collaborators\
>>>> 		slots to 50% slots in com04"
>>>>   enabled      TRUE
>>>>   limit        users {@external_collaborators} queues\
>>>> 		{medium.q@@com04} to slots=56
>>>> }
>>>>
>>>> but that didn't seem to work as it should.
>>> you mean the syntax isn't accepted? What about:
>>>
>>> limit users @external_collaborators queues medium.q hosts @com04 to slots=56
>> I'll try that, thanks - didn't come up with that. I kind of got it to 
>> work as
>>
>> {
>>    name         half_of_com04
>>    description  "resource quota to restrict external collaborators \
>>                  slots on com04 nodes"
>>    enabled      TRUE
>>    limit        users {@external_collaborators} hosts \
>>                 {@com01, at com02, at com03, at com05} to NONE
>>    limit        users {@external_collaborators} queues \
>>                 {bottom.q,low.q,medium.q,high.q} to slots=56
>> }
>> {
>>    name         half_of_com01
>>    description  "resource quota to restrict external collaborators \
>>                 slots on com01 nodes"
>>    enabled      TRUE
>>    limit        users {@external_collaborators} hosts \
>>                 {@com02, at com03, at com04, at com05} to NONE
>>    limit        users {@external_collaborators} queues \
>>                 {bottom.q,low.q,medium.q,high.q} to slots=160
>> }
>>
>> (com01 - com05 being my host groups).
>>
>>> Use of {} would mean to limit it for each inside the list on it's 
>> own, i.e. 56 per host per user.
>>
>> Ah. Thanks. That makes sense.
>>
>> Got a new problem now! This works, but isn't quite what we wanted. 
>> Although no fault of the scheduler.
>>
>> This, of course, nicely distributes the restricted users jobs across all 
>> nodes. As quite a lot of my users use smp (not MPI) with the maximum 
>> number of slots on each node, from their point of view it still blocks 
>> the queues.
> 
> but they still request a PE for their jobs? Why do they assume that the nodes are blocked, they are just used.

I know (and they know); it's just to them, they are blocked. Yes there's 
a PE "smp" that they request (with as many slots as the node has CPUs). 
(That is also, btw, how we do 'exclusive node access' - because the 
exclusive complex doesn't suspend subordinate queues, and we need that).

>> So, I assume I could get around this by setting the scheduler policy to 
>> "fill up", but I am not sure that we really want this (across the whole 
>> cluster, that is).
> 
> 
> 
> With "fillup" you mean this:
> 
> http://blogs.sun.com/sgrell/entry/grid_engine_scheduler_hacks_least ?

Yes.

Maybe if I describe what the problem is it helps.

What I have is (for this particular problem) two users. One's running a 
whole bunch of standalone single CPU 'batch' jobs. The other has some 
software that requires threading (can't do MPI) - and his jobs run 
continuously. Meaning not he's got loads, but every single one just 
never stops. Because they never stop, he's got his own queue that is 
subordinate to all the others (otherwise no one else would ever get to 
run anything).

So, basically, if the guy with the batch jobs comes in with a bunch 
(they run for a couple of days each) the other guys jobs stop producing 
data. And after a couple of days, he starts complaining. So the two of 
them have asked if I can 'do some magic' so the batch jobs don't take up 
the whole of the cluster... and I thought, well, a quota'd be easy. 
Which it was. Only it doesn't help cos the batch jobs still use all the 
nodes. So I'm trying to find a way around this. (Which can not involve 
changing the scheduler, or global, config really. The earliest I could 
do that is in mid June.)

Tina

-- 
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257834

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list