[GE users] Strange problem with resource quotas in 6.2u5

icaci hristo at mc.phys.uni-sofia.bg
Sun Mar 7 22:56:24 GMT 2010


Hello all!

I'm witnessing some odd behaviour of the resource quotas subsystem of our 6.2u5 installation. We have two types of queues, each one in both parallel and batch flavour:
- for long running jobs (p_long.q and b_long.q);
- for jobs with h_rt up to 48 hours (p_med.q and b_med.q).

I want to limit our users to 64 slots in total but give them only 48 slots for long running jobs so I've set up the following resource quota ruleset:

{
   name         users
   description  Limits imposed on ordinary users
   enabled      TRUE
   limit        name long users {*} queues *_long.q to slots=48
   limit        name total users {*} to slots=64
}

But when I try to submit a simple 56-slot parallel job with something like:

echo "sleep 30" | qsub -pe ompix8 56 -l h_rt=47:59:59

the job stays in "qw" state and qstat shows the following:
...
cannot run because it exceeds limit "hristo/////" in rule "users/total"
cannot run because it exceeds limit "hristo/////" in rule "users/long"
...
The 56 slots requirement clearly exceeds the 48 slots limit from the "users/long" rule, but for some obscure reason SGE thinks that it also exceeds the 64-slots limit from the "users/total" rule.

I tried to split the ruleset into two separate rules:

{
   name         users_long
   description  Limits imposed on ordinary users
   enabled      TRUE
   limit        users {*} queues *_long.q to slots=48
}
{
   name         users_total
   description  Limits imposed on ordinary users
   enabled      TRUE
   limit        users {*} to slots=64
}

Still no luck:
...
cannot run because it exceeds limit "hristo/////" in rule "users_total/1"
cannot run because it exceeds limit "hristo/////" in rule "users_total/1"
cannot run because it exceeds limit "hristo/////" in rule "users_total/1"
...

The job runs fine if I disable the users_total rule.

We used to run 6.2u2_1 before we upgraded to 6.2u5 and a colleague of mine insists that he was able to run 56-slots jobs before the upgrade. Have I stumbled upon a bug in 6.2u5 or did I miss the point in setting up my resource quotas?

And help would be greatly appreciated.

Hristo
--
Dr Hristo Iliev
Monte Carlo research group
Faculty of Physics, University of Sofia
5 James Bourchier blvd, 1164 Sofia, Bulgaria
http://cluster.phys.uni-sofia.bg/hristo/

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247462

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list