[GE users] Scalibility of 60u3/u4

Ron Chen ron_chen_123 at yahoo.com
Thu Apr 28 05:18:30 BST 2005


1) 400-500 queue instances
You can go up to over thousands of hosts. However, if
your qmaster is on Linux, then the select() bug in
glibc causes SGE to crash if the number of hosts is
over 1024. SGE 6.0u4 fixes & workarounds it, or use
other OSes that do not have this problem as your
qmaster host.


2) RSH:
If you only have traditional, serial batch jobs, then
you don't even need rsh/rshd or SGE's qrsh/rshd at
all. All traffic is within SGE daemons.


3) 150,000 - 300,000 jobs
The largest amount of jobs I've ever got was something
like several 10,000s. Make sure your master machine is
a fast one with lots of memory, and a dual-processor
or multi-processor machine may also help since the
scheduler can have its own processor (in SGE6 qmaster
is threaded too).


4) fast execution jobs
SGE 6 has some improvements for throughput scheduling,
and you can take a look at the scheduler's config for
some tuning parameters.


* * * * * * * * * * * * * * * * * * * * * * * * * * *
*

New feature discussion:
Since most of the jobs have short runtime, increasing
the runtime by grouping multiple of them into a bundle
would reduce the overhead of the scheduler.

There are 2 ways to do it:
1) group the jobs by hand, and submit the bundled jobs
as a job, so users only need to qsub once for each
bundle.

2) let the scheduler do the above.

Currently, SGE doesn't group multiple jobs from the
same user as bundles and send the bundles to the
execution hosts.

It would be nice if we can borrow ideas from OpenMP
scheduling:

- Dynamic chunk scheduling
SGE groups a number of jobs by the same user with the
same resource requirements into a bundle (may be 5 to
10 jobs in each bundle?), and dispatch a bundle as a
whole to an execution host at a time.

- Static scheduling
Currently, SGE looks at the load and the queue
sequence number and then decides which hosts are the
best to dispatch the jobs. But what if you have many
jobs that there will not be any idle machines? With
static scheduling, you just don't care, just give jobs
to hosts that have slots, and with FIFO scheduling,
SGE just becomes a queue for jobs to go through!

 -Ron


--- Chris Croswhite <csc at cadence.com> wrote:
> Rayson,
> 
> Perfect.  Thanks for the information!
> 
> 
> On Wed, 2005-04-27 at 16:41, Rayson Ho wrote:
> > Several hundred hosts, not a problem with SGE5.3
> or SGE6.0, many users have
> > larger clusters.
> > 
> > For a hugh amount of jobs, you may need a 64-bit
> machine so that
> > qmaster/scheduler can allocate more than 2GB of
> memory.
> > 
> > And SGE doesn't use RSH to dispatch jobs to remote
> hosts, it is used when
> > there are interactive jobs. And you don't even
> need the normal RSH daemon
> > enabled on the hosts.
> > 
> > Rayson
> > 
> > 
> > >Need some help understanding if SGE 60u3/u4 can
> meet my needs.  I have
> > >used SGE on a small scale, roughly 100 queue
> instances with no more than
> > >500 jobs ever queued/running.  My question is can
> SGE support a couple
> > >hundred queue instances (400-500) with 150,000
> -300,000 jobs
> > >queued/running?  The environment will be pumping
> through thousands of
> > >very fast execution jobs (5secs to 60 secs runs),
> yet literally, there
> > >will hundreds of thousands of them.
> > >
> > >Too, since SGE uses RSH to dispatch jobs on
> remote hosts, is there an
> > >issue with having only a single master pushing
> all these jobs e.g. will
> > >the single host run out of ports if there are
> 400-500 queue instances?
> > >
> > >Does anyone have experience with this or can give
> me some suggestions.
> > >
> > >Thanks.
> > >
> >
>
---------------------------------------------------------
> > Get your FREE E-mail account at
> http://www.eseenet.com !
> > 
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> > 
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> 
> 



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list