[GE users] resource quota question

rumpelkeks tina.friedrich at diamond.ac.uk
Wed May 19 14:52:14 BST 2010


>> Not sure - what are the exact implications? What it says on the box, 
>> exclusive access to that queue? So it would suspend all jobs in 
>> subordinate queues and give that qsub exclusive queue access? In which 
>> case I don't think. It is not supposed to use all nodes in the queue, 
>> just a subset. (I don't want to have a queue for every problem, really; 
>> I'm trying to avoid that.)
> Aha, what about this: remove the subordination (hence fill both queues 
 > (may an adjustment to any total slot count is also necessary).
> When I got you right, the endless application won't generate any load all the time. 
> If it starts to generate load, you could use a suspend_threshold (for the user with 
> the endless job) to suspend itself when his load plus the one of the normal queue 
> exceeds a limit. If he is alone on the machine, his job will continue.
> I think you have already two queues (a normal one and one for the special user) anyway.

Interesting suggestion. Not sure it works, but would need to try. The 
special users application - the one that just keeps running - does 
create load all the time (and always uses all available slots, I 
believe, so at the moment no further jobs would be schedules onto them 
anyway (but that could be changed).

What we have is four queues, actually. Bottom, Low, Medium, High. Bottom 
subordinate to everything, Low to Medium and High, Medium to High (you 
get the picture). Standard request submits to medium (got a time limit). 
Low is without a time limit. Bottom the one for the special user (I call 
him my background noise). High can only be used by our data acquisition 
software, as this must take precedence in whatever situation. The main 
requirement on the cluster is that whenever data is taken and some 
special data reduction software is run, this must be run instantly 
('real time' data processing), across as much of the cluster as 
possible. So High goes and suspends everything else (pretty quickly). 
That's also what needed the 'exclusive' most.

People are quite happy with that, it seems really. I was just (at the 
moment) trying to solve a problem of sharing between low and bottom 
queue, so trying to make a user of low 'share' resources with bottom - 
so not actually what we should be doing. I like the idea of getting rid 
of bottom and put the 'continuous jobs' user back into low, with 
appropriate thresholds set. That would work just fine, I guess. Thanks 
for setting my head right!

Btw (although off-topic here) - I want to set up a test cluster so I can 
test even scheduler changes more freely in the future. Does that really 
require a second SGE installation, or is a second cell sufficient (i.e. 
is any configuration above cell level)?

Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list