[GE users] Load Balancing
dag at sonsorol.org
Thu Nov 3 15:57:29 GMT 2005
You probably found most of the good material already. The keywords
you should zero in on in the docs and other materials are things
related to Grid Engine "policies". The policy mechanism is how grid
engine flexibly implements/enforces/controls "resource allocation"
which includes load balancing.
You may get some better tips/hints/pointers from people on this list
if you explain what you think you would like to be doing with SGE 6.
Without that info I'll try to give the 10,000 foot summary:
If you do nothing but install a default version of Grid Engine you
will get "load balancing" out of the box. Jobs will be distributed
across all your available job slots and the jobs will be allocated to
"least busy" machines first. The allocation will continue
automatically as long as there are jobs waiting dispatch and (a)
there are available job slots and (b) compute nodes have not tripped
any "I'm too busy!" alarm thresholds.
You should also understand that "load balancing" in our community
means 'distributing serial, parallel or interactive jobs across many
nodes'. It does not mean "running mysql across many nodes" or "load
balancing 100 apache webservers" etc. Remember that the term "load
balancing" means different things to different groups of people...
If you do nothing to a default install, your jobs will be dispatched
mostly in FIFO order (first in, first out). This means that jobs will
go out for execution in the order in which they were received by the
system. This is fine for small systems, small workloads or SGE
clusters run by single users. This is not so good in situations where
there are multiple people/projects/departments all trying to get work
done at the same time.
Moving away from FIFO scheduling behavior is where SGE "policies"
come into play.
The behavior that most people want (at least initially before they
really start tweaking and tuning) is often described as "fair share".
People want SGE to manage the allocation of compute resources on a
"fair" basis between users, departments, projects or groups.
A practical example of this: Imagine a scenario where User A submits
10,000 jobs to a 10-node cluster and then 1 hour later User B submits
another 10,000 jobs...
- In FIFO scheduling, all 10,000 jobs belonging to User A would
have to finish before User B got a chance to have his/her pending
- In "fairshare" scheduling, User A may be using 100% of available
cluster resources but as User A jobs finish and "drain" out of the
system, the next jobs to be dispatched would belong to User B. User
B's jobs would continue being dispatched (vaulting over any remaining/
pending jobs belonging to user A) until there is a 50-50% mixture of
running jobs between User A and User B. At that point, the SGE
scheduler would probably start alternating job dispatch to keep the
50-50 split in effect.
The policies used to do fairshare (ShareTree & Functional Share) are
also the policies you would investigate if you wanted to allocate
compute resources on a percentage basis (User A gets 80%, User B gets
20%). I've written about this exact scenario here: http://
The ShareTree and Functional policies are the most important policies
for new users and most cluster setups.
SGE does have other policy mechanisms and the cool thing is that you
can use *any* SGE policy in *any* order with *any* relative weight.
The end result is very flexible control over your system at the
expense of having to do some personal testing to get the right policy
mix going on your system.
- Urgency: You can boost the priority of jobs that request an
- Wait time: You can boost the priority of jobs based on how long
they have been waiting to run in the pending list
Hope this helps a bit.
On Nov 3, 2005, at 10:40 AM, Ron Price wrote:
> Does anyone know a good document that talks about load balancing in
> SGE 6?
> I have read through the Admin, Installation and User manual. I
> would like to find
> something that gives several types of examples.
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users