[GE users] Scalibility of 60u3/u4

Chris Croswhite csc at cadence.com
Thu Apr 28 05:24:56 BST 2005


hmm, interesting, I will have to do some homework with regards to SGE
scheduling and submitting jobs. 

Do you have an idea when u4 will be out?

Also, there exists a RFE to be able to control the number of jobs that a
user can run within a queue, not just globally.  Is this planned for u4?

Thanks!

On Wed, 2005-04-27 at 21:18, Ron Chen wrote:
> 1) 400-500 queue instances
> You can go up to over thousands of hosts. However, if
> your qmaster is on Linux, then the select() bug in
> glibc causes SGE to crash if the number of hosts is
> over 1024. SGE 6.0u4 fixes & workarounds it, or use
> other OSes that do not have this problem as your
> qmaster host.
> 
> 
> 2) RSH:
> If you only have traditional, serial batch jobs, then
> you don't even need rsh/rshd or SGE's qrsh/rshd at
> all. All traffic is within SGE daemons.
> 
> 
> 3) 150,000 - 300,000 jobs
> The largest amount of jobs I've ever got was something
> like several 10,000s. Make sure your master machine is
> a fast one with lots of memory, and a dual-processor
> or multi-processor machine may also help since the
> scheduler can have its own processor (in SGE6 qmaster
> is threaded too).
> 
> 
> 4) fast execution jobs
> SGE 6 has some improvements for throughput scheduling,
> and you can take a look at the scheduler's config for
> some tuning parameters.
> 
> 
> * * * * * * * * * * * * * * * * * * * * * * * * * * *
> *
> 
> New feature discussion:
> Since most of the jobs have short runtime, increasing
> the runtime by grouping multiple of them into a bundle
> would reduce the overhead of the scheduler.
> 
> There are 2 ways to do it:
> 1) group the jobs by hand, and submit the bundled jobs
> as a job, so users only need to qsub once for each
> bundle.
> 
> 2) let the scheduler do the above.
> 
> Currently, SGE doesn't group multiple jobs from the
> same user as bundles and send the bundles to the
> execution hosts.
> 
> It would be nice if we can borrow ideas from OpenMP
> scheduling:
> 
> - Dynamic chunk scheduling
> SGE groups a number of jobs by the same user with the
> same resource requirements into a bundle (may be 5 to
> 10 jobs in each bundle?), and dispatch a bundle as a
> whole to an execution host at a time.
> 
> - Static scheduling
> Currently, SGE looks at the load and the queue
> sequence number and then decides which hosts are the
> best to dispatch the jobs. But what if you have many
> jobs that there will not be any idle machines? With
> static scheduling, you just don't care, just give jobs
> to hosts that have slots, and with FIFO scheduling,
> SGE just becomes a queue for jobs to go through!
> 
>  -Ron
> 
> 
> --- Chris Croswhite <csc at cadence.com> wrote:
> > Rayson,
> > 
> > Perfect.  Thanks for the information!
> > 
> > 
> > On Wed, 2005-04-27 at 16:41, Rayson Ho wrote:
> > > Several hundred hosts, not a problem with SGE5.3
> > or SGE6.0, many users have
> > > larger clusters.
> > > 
> > > For a hugh amount of jobs, you may need a 64-bit
> > machine so that
> > > qmaster/scheduler can allocate more than 2GB of
> > memory.
> > > 
> > > And SGE doesn't use RSH to dispatch jobs to remote
> > hosts, it is used when
> > > there are interactive jobs. And you don't even
> > need the normal RSH daemon
> > > enabled on the hosts.
> > > 
> > > Rayson
> > > 
> > > 
> > > >Need some help understanding if SGE 60u3/u4 can
> > meet my needs.  I have
> > > >used SGE on a small scale, roughly 100 queue
> > instances with no more than
> > > >500 jobs ever queued/running.  My question is can
> > SGE support a couple
> > > >hundred queue instances (400-500) with 150,000
> > -300,000 jobs
> > > >queued/running?  The environment will be pumping
> > through thousands of
> > > >very fast execution jobs (5secs to 60 secs runs),
> > yet literally, there
> > > >will hundreds of thousands of them.
> > > >
> > > >Too, since SGE uses RSH to dispatch jobs on
> > remote hosts, is there an
> > > >issue with having only a single master pushing
> > all these jobs e.g. will
> > > >the single host run out of ports if there are
> > 400-500 queue instances?
> > > >
> > > >Does anyone have experience with this or can give
> > me some suggestions.
> > > >
> > > >Thanks.
> > > >
> > >
> >
> ---------------------------------------------------------
> > > Get your FREE E-mail account at
> > http://www.eseenet.com !
> > > 
> > >
> >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > users-unsubscribe at gridengine.sunsource.net
> > > For additional commands, e-mail:
> > users-help at gridengine.sunsource.net
> > > 
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail:
> > users-help at gridengine.sunsource.net
> > 
> > 
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list