[GE users] scheduler bottleneck?

Charu Chaubal Charu.Chaubal at Sun.COM
Wed Mar 31 01:01:42 BST 2004


Hi Bryan,

One common approach when you have many jobs on the order of a few seconds is to 
increase the number of slots per host.  This way, you can effectively package 
several jobs together.  It's true that sometimes you might have two jobs on the 
same CPU, but, on average, the hope is, the jobs will run such that no two jobs 
overlap for very long.

I assume you've looked at the tuning guide here:

http://gridengine.sunsource.net/project/gridengine/howto/tuning.html

Regards,
	Charu


Bryan Bayerdorffer wrote:
> I don't know if this is related to my earlier unresolved problem 
> (http://gridengine.sunsource.net/servlets/BrowseList?listName=users&by=thread&from=1703). 
> 
> 
> Right now we have a situation in which there are about 3000 pending jobs 
> being dispatched to ~50 exec hosts (1 slot each).  The majority of these 
> jobs have *extremely* short runtimes---just a few seconds.  The result 
> is that many hosts are idle for a long time (a minute or so) waiting for 
> new jobs to be dispatched.  Users are complaining because the total 
> throughput for this job mix is a lot lower than it was with LSF.  I'm 
> wondering if the SGEEE scheduler is a bottleneck here.  I have the 
> schedule interval set to 10 seconds.  I enabled profiling, and it seems 
> that each scheduling run takes about 45 seconds.  This is on a 450MHz 
> Ultra 60 with local /var/spool/sge, the same host that used to run the 
> LSF master.
> 
> Anything I can tune to improve the performance for short jobs?  I've 
> thought of packaging several small jobs as one, but that would require 
> big changes in the way batch submission is scripted, and it's also 
> somewhat difficult to predict the runtime.
> 
> What's "generate and send orders?"
> 
> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE job ticket 
> calculation: init: 0.320 s, pass 0: 0.180 s, pass 1: 0.000, pass2: 
> 0.000, calc: 0.350 s
> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE job ticket 
> calculation: init: 0.010 s, pass 0: 0.010 s, pass 1: 0.000, pass2: 
> 0.000, calc: 0.000 s
> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE update orders: job 
> orders: 0.590 s, update orders: 0.030 s
> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE pending job ticket 
> calculation took 1.500 s
> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE active job ticket 
> calculation took 0.020 s
> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE job sorting took 0.160 s
> Tue Mar 30 17:21:28 2004|schedd|hai7|I|PROF: SGEEE job dispatching took 
> 8.430 s
> Tue Mar 30 17:21:28 2004|schedd|hai7|I|PROF: scheduled in 10.600 (u 
> 10.400 + s 0.000 = 10.400): 8 fast, 0 complex, 2817 orders, 80 H, 267 Q, 
> 621 QA, 0 J(qw), 53 J(r), 0 J(s), 0 J(h), 0 J(e), 8 J(x), 2812 J(all) 4 
> C, 1 ACL, 1 PE, 1 CONF, 116 U, 1 D, 0 PRJ, 1 ST, 0 CKPT, 0 RU
> Tue Mar 30 17:22:00 2004|schedd|hai7|I|PROF: generate and send orders 
> took: 32.020 s
> Tue Mar 30 17:22:01 2004|schedd|hai7|I|PROF: schedd run took: 44.570 s 
> (copying
> the lists took: 1.400 s)
> 

-- 
####################################################################
# Charu V. Chaubal              # Phone: (650) 786-7672 (x87672)   #
# Grid Computing Technologist   # Fax:   (650) 786-4591            #
# Sun Microsystems, Inc.        # Email: charu.chaubal at sun.com     #
####################################################################


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list