[GE users] Strange problem with resource quotas in 6.2u5

icaci hristo at mc.phys.uni-sofia.bg
Mon Mar 8 09:08:01 GMT 2010


Hi, Reuti,

On 08.03.2010, at 01:43, reuti wrote:

> Hi,
> 
> Am 07.03.2010 um 23:56 schrieb icaci:
> 
>> Hello all!
>> 
>> I'm witnessing some odd behaviour of the resource quotas subsystem  
>> of our 6.2u5 installation. We have two types of queues, each one in  
>> both parallel and batch flavour:
>> - for long running jobs (p_long.q and b_long.q);
>> - for jobs with h_rt up to 48 hours (p_med.q and b_med.q).
> 
> what is your current setting of queue_sort_method in the scheduler  
> confiuration?
> 

queue_sort_method is set to seqno and *_med.q's get properly selected for jobs with h_rt < 48:0:0 because of their lower sequence numbers compared to *_long.q.

> 
>> I want to limit our users to 64 slots in total but give them only 48  
>> slots for long running jobs so I've set up the following resource  
>> quota ruleset:
>> 
>> {
>>  name         users
>>  description  Limits imposed on ordinary users
>>  enabled      TRUE
>>  limit        name long users {*} queues *_long.q to slots=48
>>  limit        name total users {*} to slots=64
>> }
> 
> I think it must be put into two RQS. If you put it into one RQS, you  
> can get 48 slots for jobs in*_long.q plus 64 slots for jobs not  
> running in any *_long.q. Only the first rule which fits the condition  
> is checked. Then the job is either accepted or refused.
> 
> -- Reuti
> 

We have an additional quota set that limits each project in the same manner as we limit each user. I've split all rulesets into separate RQS and now qquota shows that limits work as expected, both per user and per project. I also see no objections for exceeded limits in the output of qstat -j for the sample job. There are no free slots at that time so I'm not able to test and see if it works.

Best regards,

Hristo

> 
>> But when I try to submit a simple 56-slot parallel job with  
>> something like:
>> 
>> echo "sleep 30" | qsub -pe ompix8 56 -l h_rt=47:59:59
>> 
>> the job stays in "qw" state and qstat shows the following:
>> ...
>> cannot run because it exceeds limit "hristo/////" in rule "users/ 
>> total"
>> cannot run because it exceeds limit "hristo/////" in rule "users/long"
>> ...
>> The 56 slots requirement clearly exceeds the 48 slots limit from the  
>> "users/long" rule, but for some obscure reason SGE thinks that it  
>> also exceeds the 64-slots limit from the "users/total" rule.
>> 
>> I tried to split the ruleset into two separate rules:
>> 
>> {
>>  name         users_long
>>  description  Limits imposed on ordinary users
>>  enabled      TRUE
>>  limit        users {*} queues *_long.q to slots=48
>> }
>> {
>>  name         users_total
>>  description  Limits imposed on ordinary users
>>  enabled      TRUE
>>  limit        users {*} to slots=64
>> }
>> 
>> Still no luck:
>> ...
>> cannot run because it exceeds limit "hristo/////" in rule  
>> "users_total/1"
>> cannot run because it exceeds limit "hristo/////" in rule  
>> "users_total/1"
>> cannot run because it exceeds limit "hristo/////" in rule  
>> "users_total/1"
>> ...
>> 
>> The job runs fine if I disable the users_total rule.
>> 
>> We used to run 6.2u2_1 before we upgraded to 6.2u5 and a colleague  
>> of mine insists that he was able to run 56-slots jobs before the  
>> upgrade. Have I stumbled upon a bug in 6.2u5 or did I miss the point  
>> in setting up my resource quotas?
>> 
>> And help would be greatly appreciated.
>> 
>> Hristo
>> --
>> Dr Hristo Iliev
>> Monte Carlo research group
>> Faculty of Physics, University of Sofia
>> 5 James Bourchier blvd, 1164 Sofia, Bulgaria
>> http://cluster.phys.uni-sofia.bg/hristo/
>> 
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247462
>> 
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
>> ].
>> 
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247465
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

--
Dr Hristo Iliev
Monte Carlo research group
Faculty of Physics, University of Sofia
5 James Bourchier blvd, 1164 Sofia, Bulgaria
http://cluster.phys.uni-sofia.bg/hristo/

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247501

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list