[GE users] how to throttle jobs into a queue

david zanella zanella at mayo.edu
Fri Aug 24 16:46:02 BST 2007


I have a group of users that are submitting jobs to my grid.  The jobs
do some sort of pedigree/chromosome calculations. It is impossible for
the user to predict or control the amount of memory for each job.
Consequently, some job will start out small and grow to be about 2G in
size and run for weeks, other jobs can be small as a few hundred meg
and finish up in an hour.

I have set up load thresholds that will suspend job submission if the
available mem_free < 2G or swap_used > 6G.  For the most part, this
works well.  I have 7 T2000's for execute hosts.

Here's the problem:

My T2000's have 32G of memory and I have 30 slots for each. With the
load thresholds in place, say the server is only running 20 jobs. A job
completes and the server is now below it's load threshold. The qmaster
sees this and immediately shoves 11 jobs at the server.  Pretty soon,
the jobs grow, I run out of memory and swap, and jobs start crashing.

What I need is some way to throttle the acceptance rate to the server.
To tell the server to accept one job, then re-evaluate in, say, 15 or
30 minutes. If the load thresholds give a green light, it'll accept
another job.

I've looked at sched_conf, and it has what appears to be what I need.
I've made various adjustments to job_load_adjustments and
load_adjustment_decay_time, but these have not had any effect.

Am I missing something? Is there a better way to accomplish what I'm
trying to do?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list