[GE users] how to throttle jobs into a queue

david zanella zanella at mayo.edu
Mon Aug 27 16:56:59 BST 2007


Just wanted to report back on my success. I got this to work with exactly the bahaviour I wanted. 

Looking at the jobs, you can see about a 7-8 minute gap between jobs. 

 123178 0.55000 Sim.c9.ped ecarlson     r     08/27/2007 09:09:01 cc32 at crush.mayo.edu                1        
 123179 0.55000 Sim.c9.ped ecarlson     r     08/27/2007 09:16:01 cc32 at crush.mayo.edu                1        
 123180 0.55000 Sim.c9.ped ecarlson     r     08/27/2007 09:23:47 cc32 at crush.mayo.edu                1        
 123182 0.55000 Sim.c9.ped ecarlson     r     08/27/2007 09:32:01 cc32 at crush.mayo.edu                1        
 123183 0.55000 Sim.c10.pe ecarlson     r     08/27/2007 09:39:31 cc32 at crush.mayo.edu                1        
 123184 0.55000 Sim.c10.pe ecarlson     r     08/27/2007 09:47:16 cc32 at crush.mayo.edu                1        
 123185 0.55000 Sim.c10.pe ecarlson     r     08/27/2007 09:55:01 cc32 at crush.mayo.edu                1        

I accomplished this by artificially overloading np_load_avg, and giving that overload
a 15 minute decay time. 

Using the GUI:

Schedule config -> load adjustment tab -> in the column do np_load_avg and 
set it to 100. Adjust the decay time in the same pane. Mine is 15 minutes. 

command line:

qconf -msconf

job_load_adjustments              np_load_avg=100
load_adjustment_decay_time        00:15:00


The only way to watch it work is to do a qstat -j on one of the jobs in the wait 
queue. 

and look hard:

queue instance "cc32 at crush.mayo.edu" dropped because it is overloaded: 
np_load_avg=4.623291 (= 0.310791 + 100 * 1.380000 with nproc=1) >= 1.75
 
Wait a few minutes and you'll see the "overloaded" queue's np_load_avg drop:
 
queue instance "cc32 at crush.mayo.edu" dropped because it is overloaded: 
np_load_avg=4.439819 (= 0.314819 + 100 * 1.320000 with nproc=1) >= 1.75


> -----Original Message-----
> From: david zanella [mailto:zanella at mayo.edu] 
> Sent: Friday, August 24, 2007 11:46 AM
> To: users at gridengine.sunsource.net
> Subject: [GE users] how to throttle jobs into a queue
>
>
> I have a group of users that are submitting jobs to my grid.  The jobs
> do some sort of pedigree/chromosome calculations. It is impossible for
> the user to predict or control the amount of memory for each job.
> Consequently, some job will start out small and grow to be about 2G in
> size and run for weeks, other jobs can be small as a few hundred meg
> and finish up in an hour.
>
> I have set up load thresholds that will suspend job submission if the
> available mem_free < 2G or swap_used > 6G.  For the most part, this
> works well.  I have 7 T2000's for execute hosts.
>
> Here's the problem:
>
> My T2000's have 32G of memory and I have 30 slots for each. With the
> load thresholds in place, say the server is only running 20 jobs. A job
> completes and the server is now below it's load threshold. The qmaster
> sees this and immediately shoves 11 jobs at the server.  Pretty soon,
> the jobs grow, I run out of memory and swap, and jobs start crashing.
>
> What I need is some way to throttle the acceptance rate to the server.
> To tell the server to accept one job, then re-evaluate in, say, 15 or
> 30 minutes. If the load thresholds give a green light, it'll accept
> another job.
>
> I've looked at sched_conf, and it has what appears to be what I need.
> I've made various adjustments to job_load_adjustments and
> load_adjustment_decay_time, but these have not had any effect.
>
> Am I missing something? Is there a better way to accomplish what I'm
> trying to do?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> ------------- End Forwarded Message -------------
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

------------- End Forwarded Message -------------

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list