[GE users] Strange problem with resource quotas in 6.2u5

reuti reuti at staff.uni-marburg.de
Tue Mar 9 14:16:30 GMT 2010


Hi,

Am 08.03.2010 um 11:03 schrieb icaci:

> <snip>
> Resources just become available and I was able to conduct some  
> tests. The results are beyond my comprehension. What I did was to  
> submit the same sleeper parallel job which requires 56 slots and  
> h_rt=47:59:59. The RQS in place are:
>
> {
>   name         usr_long
>   description  Limits imposed on ordinary users
>   enabled      TRUE
>   limit        users {*} queues *_long.q to slots=48
> }
> {
>   name         usr_med+long
>   description  Limits imposed on ordinary users
>   enabled      TRUE
>   limit        users {*} queues *_med.q,*_long.q to slots=64
> }
>
> This setup works and the job ends in p_med.q as expected. But if I  
> change usr_med+long to
>  limit users {*} queues * to slots=64
> or to
>  limit users {*} queues *.q to slots=64
> or just to
>  limit users {*} to slots=64
> I get
>  cannot run because it exceeds limit "hristo/////" in rule "usr_med 
> +long/1"
>
> It might be connected somehow to issue 2538 but changing  
> queue_sort_order to load does not make the job run.
>
> Either I don't understand how RQS filter matching works or I should  
> do some debugging. I should stick to specifying the full list of  
> queues which does the trick for now.

maybe you can add to the issue. It's still not clear to me whether it  
related to the queue_sort_method and/or the usage of a wildcard.

-- Reuti


> Thanks for your time,
>
> Hristo
>
>>
>>> Best regards,
>>>
>>> Hristo
>>>
>>>>
>>>>> But when I try to submit a simple 56-slot parallel job with
>>>>> something like:
>>>>>
>>>>> echo "sleep 30" | qsub -pe ompix8 56 -l h_rt=47:59:59
>>>>>
>>>>> the job stays in "qw" state and qstat shows the following:
>>>>> ...
>>>>> cannot run because it exceeds limit "hristo/////" in rule "users/
>>>>> total"
>>>>> cannot run because it exceeds limit "hristo/////" in rule "users/
>>>>> long"
>>>>> ...
>>>>> The 56 slots requirement clearly exceeds the 48 slots limit  
>>>>> from the
>>>>> "users/long" rule, but for some obscure reason SGE thinks that it
>>>>> also exceeds the 64-slots limit from the "users/total" rule.
>>>>>
>>>>> I tried to split the ruleset into two separate rules:
>>>>>
>>>>> {
>>>>> name         users_long
>>>>> description  Limits imposed on ordinary users
>>>>> enabled      TRUE
>>>>> limit        users {*} queues *_long.q to slots=48
>>>>> }
>>>>> {
>>>>> name         users_total
>>>>> description  Limits imposed on ordinary users
>>>>> enabled      TRUE
>>>>> limit        users {*} to slots=64
>>>>> }
>>>>>
>>>>> Still no luck:
>>>>> ...
>>>>> cannot run because it exceeds limit "hristo/////" in rule
>>>>> "users_total/1"
>>>>> cannot run because it exceeds limit "hristo/////" in rule
>>>>> "users_total/1"
>>>>> cannot run because it exceeds limit "hristo/////" in rule
>>>>> "users_total/1"
>>>>> ...
>>>>>
>>>>> The job runs fine if I disable the users_total rule.
>>>>>
>>>>> We used to run 6.2u2_1 before we upgraded to 6.2u5 and a colleague
>>>>> of mine insists that he was able to run 56-slots jobs before the
>>>>> upgrade. Have I stumbled upon a bug in 6.2u5 or did I miss the  
>>>>> point
>>>>> in setting up my resource quotas?
>>>>>
>>>>> And help would be greatly appreciated.
>>>>>
>>>>> Hristo
>>>>> --
>>>>> Dr Hristo Iliev
>>>>> Monte Carlo research group
>>>>> Faculty of Physics, University of Sofia
>>>>> 5 James Bourchier blvd, 1164 Sofia, Bulgaria
>>>>> http://cluster.phys.uni-sofia.bg/hristo/
>>>>>
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>>>> dsForumId=38&dsMessageId=247462
>>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users- 
>>>>> unsubscribe at gridengine.sunsource.net
>>>>> ].
>>>>>
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>>> dsForumId=38&dsMessageId=247465
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users- 
>>>> unsubscribe at gridengine.sunsource.net
>>>> ].
>>>
>>> --
>>> Dr Hristo Iliev
>>> Monte Carlo research group
>>> Faculty of Physics, University of Sofia
>>> 5 James Bourchier blvd, 1164 Sofia, Bulgaria
>>> http://cluster.phys.uni-sofia.bg/hristo/
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>> dsForumId=38&dsMessageId=247501
>>>
>>> To unsubscribe from this discussion, e-mail: [users- 
>>> unsubscribe at gridengine.sunsource.net
>>> ].
>>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=247503
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>
> --
> Dr Hristo Iliev
> Monte Carlo research group
> Faculty of Physics, University of Sofia
> 5 James Bourchier blvd, 1164 Sofia, Bulgaria
> http://cluster.phys.uni-sofia.bg/hristo/
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=247506
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247694

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list