[GE users] Help with resource quota sets

jallen at it.uts.edu.au jallen at it.uts.edu.au
Wed Jan 30 23:26:11 GMT 2008


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

>
> Can you please copy and paste the full rule set and the qquota output
> because there is some miss match between them.
>
> Is it possible that you have extremely short running jobs? In this case
> it could be that some jobs are already finished and undebited and
> therefore not included in the output?
>

Hi Roland,

Sorry for misleading you. There actually appears to be accounting issues
even with a single project in each limit (the example I gave you was just
a made up output to try and describe the problem). Here is a real example
from the command prompt as I write (show names are generically renamed):

There would be the occasional short running job, but these are rare and
hardly noticeable when things are working properly. The situation I
describe is ongoing based on many hours of monitoring.

% qconf -srqs
{
   name         project_slots
   description  "divide the available daytime slots between projects"
   enabled      TRUE
   limit        projects show1 hosts @r3, at r4, at r5 to big_slots=8
   limit        projects show2 hosts @r3, at r4, at r5 to big_slots=40
   limit        projects show3 hosts @r3, at r4, at r5 to big_slots=214
}

% qquota
resource quota rule limit                filter
--------------------------------------------------------------------------------
project_slots/2    big_slots=-4/40      projects show2 hosts @r3, at r4, at r5
project_slots/3    big_slots=68/214     projects show3 hosts @r3, at r4, at r5

But there is actually hardly anything running:

% python quota.py
{'workstations': 2, 'show3': 9}

The negative resource usage is difficult to request. But look what happens
when some jobs start with -P show2:

% python quota.py
{'workstations': 2, 'show2': 20, 'show3': 9}

% qquota
resource quota rule limit                filter
--------------------------------------------------------------------------------
project_slots/2    big_slots=16/40      projects racer hosts @r3, at r4, at r5
project_slots/3    big_slots=68/214     projects ruins hosts @r3, at r4, at r5

Restarting the scheduler:
% /etc/init.d/sgemaster stop
   Shutting down Grid Engine scheduler
   Shutting down Grid Engine qmaster
% /etc/init.d/sgemaster start
   starting sge_qmaster
   starting sge_schedd

And everything is okay again (!):

% qquota
resource quota rule limit                filter
--------------------------------------------------------------------------------
project_slots/3    big_slots=9/214      projects ruins hosts @r3, at r4, at r5

% python quota.py
{'workstations': 2, 'show3': 9}

I hope this is better information than before. I still suspect this
happens when a cron job flips the "enabled" flag on the RQS. Let me know
if I can do anything to help diagnose why this would happen.

Regards,

 -- JA



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list