[GE users] Load Balancing

Chris Dagdigian dag at sonsorol.org
Thu Nov 3 15:57:29 GMT 2005

Hi Ron,

You probably found most of the good material already. The keywords  
you should zero in on in the docs and other materials are things  
related to Grid Engine "policies". The policy mechanism is how grid  
engine flexibly implements/enforces/controls "resource allocation"  
which includes load balancing.

You may get some better tips/hints/pointers from people on this list  
if you explain what you think you would like to be doing with SGE 6.

Without that info I'll try to give the 10,000 foot summary:

If you do nothing but install a default version of Grid Engine you  
will get "load balancing" out of the box. Jobs will be distributed  
across all your available job slots and the jobs will be allocated to  
"least busy" machines first. The allocation will continue  
automatically as long as there are jobs waiting dispatch and (a)  
there are available job slots and (b) compute nodes have not tripped  
any "I'm too busy!" alarm thresholds.

You should also understand that "load balancing" in our community  
means 'distributing serial, parallel or interactive jobs across many  
nodes'. It does not mean "running mysql across many nodes" or "load  
balancing 100 apache webservers" etc. Remember that the term "load  
balancing" means different things to different groups of people...

If you do nothing to a default install, your jobs will  be dispatched  
mostly in FIFO order (first in, first out). This means that jobs will  
go out for execution in the order in which they were received by the  
system. This is fine for small systems, small workloads or SGE  
clusters run by single users. This is not so good in situations where  
there are multiple people/projects/departments all trying to get work  
done at the same time.

Moving away from FIFO scheduling behavior is where SGE "policies"  
come into play.

The behavior that most people want (at least initially before they  
really start tweaking and tuning) is often described as "fair share".  
People want SGE to manage the allocation of compute resources on a  
"fair" basis between users, departments, projects or groups.

A practical example of this: Imagine a scenario where User A submits  
10,000 jobs to a 10-node cluster and then 1 hour later User B submits  
another 10,000 jobs...

  - In FIFO scheduling, all 10,000 jobs belonging to User A would  
have to finish before User B got a chance to have his/her pending  
jobs run.

  - In "fairshare" scheduling, User A may be using 100% of available  
cluster resources but as User A jobs finish and "drain" out of the  
system, the next jobs to be dispatched would belong to User B. User  
B's jobs would continue being dispatched (vaulting over any remaining/ 
pending jobs belonging to user A) until there is a 50-50% mixture of  
running jobs between User A and User B. At that point, the SGE  
scheduler would probably start alternating job dispatch to keep the  
50-50 split in effect.

The policies used to do fairshare (ShareTree &  Functional Share) are  
also the policies you would investigate if you wanted to allocate  
compute resources on a percentage basis (User A gets 80%, User B gets  
20%).  I've written about this exact scenario here: http:// 

The ShareTree and Functional policies are the most important policies  
for new users and most cluster setups.

SGE does have other policy mechanisms and the cool thing is that you  
can use *any* SGE policy in *any* order with *any* relative weight.  
The end result is very flexible control over your system at the  
expense of having to do some personal testing to get the right policy  
mix going on your system.

Other policies:

  - Urgency:  You can boost the priority of jobs that request an  
"important" resource
  - Wait time: You can boost the priority of jobs based on how long  
they have been waiting to run in the pending list

Hope this helps a bit.


On Nov 3, 2005, at 10:40 AM, Ron Price wrote:

> All,
> Does anyone know a good document that talks about load balancing in  
> SGE 6?
> I have read through the Admin, Installation and User manual. I  
> would like to find
> something that gives several types of examples.
> Thanks!!!
> /ron

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list