[GE users] resource quota question

rumpelkeks tina.friedrich at diamond.ac.uk
Wed May 19 15:20:51 BST 2010


Hi,

reuti wrote:
> Hi,
> 
> Am 19.05.2010 um 15:52 schrieb rumpelkeks:
> 
>> Hi,
>>
>> <snip>
>>>> Not sure - what are the exact implications? What it says on the box, 
>>>> exclusive access to that queue? So it would suspend all jobs in 
>>>> subordinate queues and give that qsub exclusive queue access? In which 
>>>> case I don't think. It is not supposed to use all nodes in the queue, 
>>>> just a subset. (I don't want to have a queue for every problem, really; 
>>>> I'm trying to avoid that.)
>>> Aha, what about this: remove the subordination (hence fill both queues 
>>> (may an adjustment to any total slot count is also necessary).
>>> When I got you right, the endless application won't generate any load all the time. 
>>> If it starts to generate load, you could use a suspend_threshold (for the user with 
>>> the endless job) to suspend itself when his load plus the one of the normal queue 
>>> exceeds a limit. If he is alone on the machine, his job will continue.
>>>
>>> I think you have already two queues (a normal one and one for the special user) anyway.
>> </snip>
>>
>> Interesting suggestion. Not sure it works, but would need to try. The 
>> special users application - the one that just keeps running - does 
>> create load all the time (and always uses all available slots, I 
>> believe, so at the moment no further jobs would be schedules onto them 
>> anyway (but that could be changed).
>>
>> What we have is four queues, actually. Bottom, Low, Medium, High. Bottom 
>> subordinate to everything, Low to Medium and High, Medium to High (you 
>> get the picture). Standard request submits to medium (got a time limit). 
>> Low is without a time limit. Bottom the one for the special user (I call 
>> him my background noise). High can only be used by our data acquisition 
>> software, as this must take precedence in whatever situation. The main 
>> requirement on the cluster is that whenever data is taken and some 
>> special data reduction software is run, this must be run instantly 
>> ('real time' data processing), across as much of the cluster as 
>> possible. So High goes and suspends everything else (pretty quickly). 
>> That's also what needed the 'exclusive' most.
>>
>> People are quite happy with that, it seems really. I was just (at the 
>> moment) trying to solve a problem of sharing between low and bottom 
>> queue, so trying to make a user of low 'share' resources with bottom - 
>> so not actually what we should be doing. I like the idea of getting rid 
>> of bottom and put the 'continuous jobs' user back into low, with 
>> appropriate thresholds set. That would work just fine, I guess. Thanks 
>> for setting my head right!
>>
>> Btw (although off-topic here) - I want to set up a test cluster so I can 
>> test even scheduler changes more freely in the future. Does that really 
>> require a second SGE installation, or is a second cell sufficient (i.e. 
>> is any configuration above cell level)?
> 
> a new cell is sufficient. It's just a shared SGE installation which don't know anything about others, but uses different ports for communication. Hence you will need to source the correct `settings.sh` from the cell you want to use for your SGE commands.
> 
> I even have two older machines set aside just for a mini cluster to test things.

I'll probably have a couple of virtual machines, or so. That's alright 
then - we're not running with the 'default' cell as is so I know how to 
handle the settings etc setup.

> 
> -- Reuti
> 
> 
>> -- 
>> Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
>> Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257861
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257864
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> 


-- 
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257869

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list