[GE users] load_thresholds questions

reuti reuti at staff.uni-marburg.de
Mon Nov 15 13:54:36 GMT 2010


Hi,

Am 10.11.2010 um 12:43 schrieb chambon:

> I try to understand the load_thresholds in details
> 
> I want to limit the dispatching of jobs to worker-nodes according to CPU load (and not by using nb_load_average) 
> 
> In my undestanding cpu is a predefined complex, which represents the percentage of CPU used (expressed between 0 and 100), so :
> I defined a load_formula like  load_formula  cpu*0.01
> and with 
> job_load_adjustments              cpu=20   (20 to express 20%)
> load_adjustment_decay_time        00:00:40

this is a really short time, and in the same range as the load_report_time I assume.


> I also defined a queue (8 slots per worker-node) with load_thresholds with = 90 (90 to express 90%)
>> qconf -sq demo.q | grep load_thresholds
> load_thresholds       cpu=90
> 
> I first submit 5 jobs 
> qsub -l h=ccwalk38 -t 1-5:1 -q demo.q UseCPU.script
> 
> cpu must be around 5 / 8  ~= 62%
> that's ok with qconf 
>  qconf -se ccwalk38 | grep cpu
>                      cpu=62.600000
> 
> Then , I submit 5  new jobs  (the previous ones are still running) 
> qsub -l h=ccwalk38 -t 1-5:1 -q demo.q UseCPU.script
> 
> only 2 new jobs get running, => ok

Only one should start I think. Then you are at 82% and the second one giving 102% shouldn't be scheduled.


> Looking at the pending jobs, I see :
> qstat -s p -j ...
> cheduling info:            queue instance "demo.q at ccwalk38.in2p3.fr" dropped because it is overloaded: cpu=127.600000 (= 62.600000 + 20 * 3.250000)

At the beginning of the load_adjustment_decay_time it's 1 per job and will be lowered to zero. The 3.25 are then the sum of all the shares of all the running jobs still in the load_adjustment_decay_time. It could be 0.75 + 0.75 + 0.75 + 0.50 + 0.50 or whatever. Which job adds which amount to the total isn't printed out.

-- Reuti 


> >= 90
> 
> 
> The question is : where does the 3.25 comes from ?
> 
> why it's not only 1.0 at the begin time, 0.5 at half time, etc. 
> (load_adjustment is decayed linearly over time)
> 
> Best regards
> Bernard CHAMBON
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=294479
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=295857

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list