[GE users] Help with resource quota sets

jallen at it.uts.edu.au jallen at it.uts.edu.au
Fri Jan 25 04:20:28 GMT 2008


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

Firstly, sorry for the long mail.

We have been experimenting with the use of the new RQS feature as a way of
limiting resources to specific projects (using GE6.1u2). Unfortunately in
our setup we have a legacy attribute "big_slots" which is a forced
consumable attached to each host indicating how many processes the host
can run, so we do:

qsub -l big_slots=2 <args>

We have the following example resource quota sets which are enabled
automatically at 9AM and disabled at 9PM (by cron job) by flipping the
"enabled" flag.

% qconf -srqs
{
   name         project_slots
   description  "divide the available daytime slots between projects"
   enabled      TRUE
   limit        projects show1 hosts !@workstations to big_slots=40
   limit        projects show2,show3 hosts !@workstations to big_slots=214
}

In case we have this wrong, the rules are intended to do the following:
1. Limit show1 to a total 40 "big_slots" on non-workstations.
2. Limit show1+show2 to a total of 214 "big_slots" on non-workstations.

The restrictions are indented to apply only to hosts that arn't in
@workstations as they do not provide a stable and reliable resource (so
any show can use @workstations without being counted as quota).

This seems to works fine, and we even wrote a small tool around qstat to
count quota use.

% python quota.py
{'show2+show3': 198, 'NONE': 16, 'workstations': 18, 'show1': 8}

Now to the problems we are facing:

If we run the qquota command it prints nothing regardless of what
arguments are given:

% qquota -P show2
<no output>

A more serious problem is that, at random (usually after 1-5 days) the
quota starts allowing far less jobs to run than the quota should allow.
For jobs submitted with -P, if we do qstat -j <job> it will mention the
job can't run because it exceeds rule /2 of the resource quota. If we
restart the scheduler/qmaster, the restriction gets cleared and the quota
will work properly for a while until it breaks again.

Any ideas why this would happen? Is it a problem with our defined RQSs or
perhaps the use of "big_slots" consumable? Could this be a bug in GE that
is fixed in 6.1u3/u4? Could disabling and enabling the quota with the
enabled attribute create problems?

Many thanks for any assistance.

 -- JA



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list