[GE users] Strange problem with resource quotas in 6.2u5

reuti reuti at staff.uni-marburg.de
Sun Mar 7 23:43:07 GMT 2010


Hi,

Am 07.03.2010 um 23:56 schrieb icaci:

> Hello all!
>
> I'm witnessing some odd behaviour of the resource quotas subsystem  
> of our 6.2u5 installation. We have two types of queues, each one in  
> both parallel and batch flavour:
> - for long running jobs (p_long.q and b_long.q);
> - for jobs with h_rt up to 48 hours (p_med.q and b_med.q).

what is your current setting of queue_sort_method in the scheduler  
confiuration?


> I want to limit our users to 64 slots in total but give them only 48  
> slots for long running jobs so I've set up the following resource  
> quota ruleset:
>
> {
>   name         users
>   description  Limits imposed on ordinary users
>   enabled      TRUE
>   limit        name long users {*} queues *_long.q to slots=48
>   limit        name total users {*} to slots=64
> }

I think it must be put into two RQS. If you put it into one RQS, you  
can get 48 slots for jobs in*_long.q plus 64 slots for jobs not  
running in any *_long.q. Only the first rule which fits the condition  
is checked. Then the job is either accepted or refused.

-- Reuti


> But when I try to submit a simple 56-slot parallel job with  
> something like:
>
> echo "sleep 30" | qsub -pe ompix8 56 -l h_rt=47:59:59
>
> the job stays in "qw" state and qstat shows the following:
> ...
> cannot run because it exceeds limit "hristo/////" in rule "users/ 
> total"
> cannot run because it exceeds limit "hristo/////" in rule "users/long"
> ...
> The 56 slots requirement clearly exceeds the 48 slots limit from the  
> "users/long" rule, but for some obscure reason SGE thinks that it  
> also exceeds the 64-slots limit from the "users/total" rule.
>
> I tried to split the ruleset into two separate rules:
>
> {
>   name         users_long
>   description  Limits imposed on ordinary users
>   enabled      TRUE
>   limit        users {*} queues *_long.q to slots=48
> }
> {
>   name         users_total
>   description  Limits imposed on ordinary users
>   enabled      TRUE
>   limit        users {*} to slots=64
> }
>
> Still no luck:
> ...
> cannot run because it exceeds limit "hristo/////" in rule  
> "users_total/1"
> cannot run because it exceeds limit "hristo/////" in rule  
> "users_total/1"
> cannot run because it exceeds limit "hristo/////" in rule  
> "users_total/1"
> ...
>
> The job runs fine if I disable the users_total rule.
>
> We used to run 6.2u2_1 before we upgraded to 6.2u5 and a colleague  
> of mine insists that he was able to run 56-slots jobs before the  
> upgrade. Have I stumbled upon a bug in 6.2u5 or did I miss the point  
> in setting up my resource quotas?
>
> And help would be greatly appreciated.
>
> Hristo
> --
> Dr Hristo Iliev
> Monte Carlo research group
> Faculty of Physics, University of Sofia
> 5 James Bourchier blvd, 1164 Sofia, Bulgaria
> http://cluster.phys.uni-sofia.bg/hristo/
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247462
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247465

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list