[GE users] Scalibility of 60u3/u4

Andreas Haas Andreas.Haas at sun.com
Thu Apr 28 09:50:09 BST 2005


You should be aware of the possibilities of "max_u_jobs" and "max_jobs"
in sge_conf(5). These parameters allow you to set highwater marks
for the maximum number of jobs that will be accepted by Grid Engine.
In case the highwater mark is met qsub returns exit code 25. That
qualifies for the use of a qsub wrapper that sleeps a certain
time before submission retry. If such a wrapper can be used in your
setup job streaming can be put into practice. At a large scale this
has a beneficial impact on throughput as there will always be a limit
how many jobs efficiently can be scheduled with Grid Engine.

Regards,
Andreas

On Thu, 28 Apr 2005, Chris Croswhite wrote:

> I will have a look at the parameters for tuning the scheduler.  Curious
> what you mean by "ok utilization"?
>
> As for tuning the scheduler, you recommend the scheduler run at half the
> time of the shortest running job, that would mean every 2.5-5secs, is
> that realistic (with shortest job run being 5 seconds, 300-500 hosts and
> 150k-300k pending jobs) or will the system be a sunk just running the
> scheduler?  Perhaps the hope of using a dual V20z w/8G is not
> addequeate?!?!?
>
> Thanks immensely.  This type of information is greatly appreciated!!!
>
> > The numbers are no problem. I have seen grids with more than 1000 queue
> > instances or havening
> > more than 300k jobs in the system. Both were no problem, if the system
> > is configured right and
> > you have appropriate hardware.
> >
> > The only thing I would worry a bit about is the short run time of your
> > jobs. Using a couple hundred
> > exec hosts with 200k jobs in the system will most likely reduce the
> > utilization of your grid because
> > the scheduler needs to be able to handle the job numbers.
> >
> > When you setup your grid, please have a look at the performance tuning
> > how to.  It was not updated
> > yet and only covers the available parameters in 6.0u1 and earlier. I
> > think, I should update it... :-)
> >
> > Also take a look at the scheduler profiling. Based on my own tests, I
> > would say that the job runtime
> > should be two times longer than the max scheduler runtime to achive okay
> > utilization.
> >
> >
> > >>>Too, since SGE uses RSH to dispatch jobs on remote hosts, is there an
> > >>>issue with having only a single master pushing all these jobs e.g. will
> > >>>the single host run out of ports if there are 400-500 queue instances?
> > >>>
> > >>>Does anyone have experience with this or can give me some suggestions.
> > >>>
> > >>>Thanks.
> > >>>
> > >>>
> > >>>
> > >>---------------------------------------------------------
> > >>Get your FREE E-mail account at http://www.eseenet.com !
> > >>
> > >>---------------------------------------------------------------------
> > >>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > >>For additional commands, e-mail: users-help at gridengine.sunsource.net
> > >>
> > >>
> > >>
> > >
> > >
> > >---------------------------------------------------------------------
> > >To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > >For additional commands, e-mail: users-help at gridengine.sunsource.net
> > >
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list