[GE users] scheduler bottleneck?

Ron Chen ron_chen_123 at yahoo.com
Thu Apr 1 01:40:13 BST 2004


Andy,

Would the new scheduler improvements in SGE 6.0 help?

"Grid Engine 6.0 Throughput Scheduler and Scheduling
Improvements"

http://gridengine.sunsource.net/project/gridengine/workshop22-24.09.03/proceedings.html

 -Ron

--- Andy Schwierskott <andy.schwierskott at sun.com>
wrote:
> Bryan,
> 
> which SGE version are you using?
> 
> do you have parallel jobs running? Are they
> requesting a PE range (like
> "-pe xyz 4-16"
> 
> What is the profile output now?
> 
> Andy
> 
> > Thanks for the pointer.  I did some of the things
> suggested by the tuning
> > guide and set the FLUSH_ params to 0, and things
> appear better now.
> >
> >
> > Charu Chaubal wrote:
> > > Hi Bryan,
> > >
> > > One common approach when you have many jobs on
> the order of a few
> > > seconds is to increase the number of slots per
> host.  This way, you can
> > > effectively package several jobs together.  It's
> true that sometimes you
> > > might have two jobs on the same CPU, but, on
> average, the hope is, the
> > > jobs will run such that no two jobs overlap for
> very long.
> > >
> > > I assume you've looked at the tuning guide here:
> > >
> > >
>
http://gridengine.sunsource.net/project/gridengine/howto/tuning.html
> > >
> > > Regards,
> > >     Charu
> > >
> > >
> > > Bryan Bayerdorffer wrote:
> > >
> > >> I don't know if this is related to my earlier
> unresolved problem
> > >>
>
(http://gridengine.sunsource.net/servlets/BrowseList?listName=users&by=thread&from=1703).
> > >>
> > >>
> > >> Right now we have a situation in which there
> are about 3000 pending
> > >> jobs being dispatched to ~50 exec hosts (1 slot
> each).  The majority
> > >> of these jobs have *extremely* short
> runtimes---just a few seconds.
> > >> The result is that many hosts are idle for a
> long time (a minute or
> > >> so) waiting for new jobs to be dispatched. 
> Users are complaining
> > >> because the total throughput for this job mix
> is a lot lower than it
> > >> was with LSF.  I'm wondering if the SGEEE
> scheduler is a bottleneck
> > >> here.  I have the schedule interval set to 10
> seconds.  I enabled
> > >> profiling, and it seems that each scheduling
> run takes about 45
> > >> seconds.  This is on a 450MHz Ultra 60 with
> local /var/spool/sge, the
> > >> same host that used to run the LSF master.
> > >>
> > >> Anything I can tune to improve the performance
> for short jobs?  I've
> > >> thought of packaging several small jobs as one,
> but that would require
> > >> big changes in the way batch submission is
> scripted, and it's also
> > >> somewhat difficult to predict the runtime.
> > >>
> > >> What's "generate and send orders?"
> > >>
> > >> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF:
> SGEEE job ticket
> > >> calculation: init: 0.320 s, pass 0: 0.180 s,
> pass 1: 0.000, pass2:
> > >> 0.000, calc: 0.350 s
> > >> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF:
> SGEEE job ticket
> > >> calculation: init: 0.010 s, pass 0: 0.010 s,
> pass 1: 0.000, pass2:
> > >> 0.000, calc: 0.000 s
> > >> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF:
> SGEEE update orders: job
> > >> orders: 0.590 s, update orders: 0.030 s
> > >> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF:
> SGEEE pending job ticket
> > >> calculation took 1.500 s
> > >> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF:
> SGEEE active job ticket
> > >> calculation took 0.020 s
> > >> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF:
> SGEEE job sorting took
> > >> 0.160 s
> > >> Tue Mar 30 17:21:28 2004|schedd|hai7|I|PROF:
> SGEEE job dispatching
> > >> took 8.430 s
> > >> Tue Mar 30 17:21:28 2004|schedd|hai7|I|PROF:
> scheduled in 10.600 (u
> > >> 10.400 + s 0.000 = 10.400): 8 fast, 0 complex,
> 2817 orders, 80 H, 267
> > >> Q, 621 QA, 0 J(qw), 53 J(r), 0 J(s), 0 J(h), 0
> J(e), 8 J(x), 2812
> > >> J(all) 4 C, 1 ACL, 1 PE, 1 CONF, 116 U, 1 D, 0
> PRJ, 1 ST, 0 CKPT, 0 RU
> > >> Tue Mar 30 17:22:00 2004|schedd|hai7|I|PROF:
> generate and send orders
> > >> took: 32.020 s
> > >> Tue Mar 30 17:22:01 2004|schedd|hai7|I|PROF:
> schedd run took: 44.570 s
> > >> (copying
> > >> the lists took: 1.400 s)
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> 


__________________________________
Do you Yahoo!?
Yahoo! Finance Tax Center - File online. File on time.
http://taxes.yahoo.com/filing.html

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list