[GE users] scheduler bottleneck?

Bryan Bayerdorffer bryan.bayerdorffer at analog.com
Wed Mar 31 01:31:28 BST 2004


Thanks for the pointer.  I did some of the things suggested by the tuning 
guide and set the FLUSH_ params to 0, and things appear better now.


Charu Chaubal wrote:
> Hi Bryan,
> 
> One common approach when you have many jobs on the order of a few 
> seconds is to increase the number of slots per host.  This way, you can 
> effectively package several jobs together.  It's true that sometimes you 
> might have two jobs on the same CPU, but, on average, the hope is, the 
> jobs will run such that no two jobs overlap for very long.
> 
> I assume you've looked at the tuning guide here:
> 
> http://gridengine.sunsource.net/project/gridengine/howto/tuning.html
> 
> Regards,
>     Charu
> 
> 
> Bryan Bayerdorffer wrote:
> 
>> I don't know if this is related to my earlier unresolved problem 
>> (http://gridengine.sunsource.net/servlets/BrowseList?listName=users&by=thread&from=1703). 
>>
>>
>> Right now we have a situation in which there are about 3000 pending 
>> jobs being dispatched to ~50 exec hosts (1 slot each).  The majority 
>> of these jobs have *extremely* short runtimes---just a few seconds.  
>> The result is that many hosts are idle for a long time (a minute or 
>> so) waiting for new jobs to be dispatched.  Users are complaining 
>> because the total throughput for this job mix is a lot lower than it 
>> was with LSF.  I'm wondering if the SGEEE scheduler is a bottleneck 
>> here.  I have the schedule interval set to 10 seconds.  I enabled 
>> profiling, and it seems that each scheduling run takes about 45 
>> seconds.  This is on a 450MHz Ultra 60 with local /var/spool/sge, the 
>> same host that used to run the LSF master.
>>
>> Anything I can tune to improve the performance for short jobs?  I've 
>> thought of packaging several small jobs as one, but that would require 
>> big changes in the way batch submission is scripted, and it's also 
>> somewhat difficult to predict the runtime.
>>
>> What's "generate and send orders?"
>>
>> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE job ticket 
>> calculation: init: 0.320 s, pass 0: 0.180 s, pass 1: 0.000, pass2: 
>> 0.000, calc: 0.350 s
>> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE job ticket 
>> calculation: init: 0.010 s, pass 0: 0.010 s, pass 1: 0.000, pass2: 
>> 0.000, calc: 0.000 s
>> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE update orders: job 
>> orders: 0.590 s, update orders: 0.030 s
>> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE pending job ticket 
>> calculation took 1.500 s
>> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE active job ticket 
>> calculation took 0.020 s
>> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE job sorting took 
>> 0.160 s
>> Tue Mar 30 17:21:28 2004|schedd|hai7|I|PROF: SGEEE job dispatching 
>> took 8.430 s
>> Tue Mar 30 17:21:28 2004|schedd|hai7|I|PROF: scheduled in 10.600 (u 
>> 10.400 + s 0.000 = 10.400): 8 fast, 0 complex, 2817 orders, 80 H, 267 
>> Q, 621 QA, 0 J(qw), 53 J(r), 0 J(s), 0 J(h), 0 J(e), 8 J(x), 2812 
>> J(all) 4 C, 1 ACL, 1 PE, 1 CONF, 116 U, 1 D, 0 PRJ, 1 ST, 0 CKPT, 0 RU
>> Tue Mar 30 17:22:00 2004|schedd|hai7|I|PROF: generate and send orders 
>> took: 32.020 s
>> Tue Mar 30 17:22:01 2004|schedd|hai7|I|PROF: schedd run took: 44.570 s 
>> (copying
>> the lists took: 1.400 s)
>>
> 

-- 
  .. ..-. ..- -.-. .- -. .-. . .- -.. - .... .. ... --. . - .- .-.. .. ..-. .
Bryan Bayerdorffer       bryan at meatspace.net              bryan at spd.analog.com
                    (Wit's End Computation Center)           (Analog Devices)

"This isn't right.  This isn't even wrong."
                                                                  -- Hans Bethe

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list