[GE users] scheduler bottleneck?

John Saalwaechter bababooey182 at yahoo.com
Mon Apr 5 21:44:51 BST 2004


Would combining jobs into arrays help?  One thing that wasn't
obvious to me when I first started using SGE was how to combine
disparate jobs into an array.  Initially I only considered
arrays when I had to run the exact same script (or script
template) multiple times.

Consider the case, though, when you need to run many dissimilar
jobs, each with their own independent shell script, but that all
the jobs have the same resource requirements.  Just copy
all the scripts to a temporary directory, renaming them myjob.1,
myjob.2, myjob.3, myjob.4, ...

Then create a master shell script called myarrayjob.sh that
basically has this logic:

    #!/bin/sh
    mytmpdir=/path/to/my/temporary/dir
    $mytmpdir/myjob.$SGE_TASK_ID

Then run "qsub -t 1-N myarrayjob.sh", where N is the number
of jobs.

If this type of scheme works for your environment, it should
reduce the time for scheduling runs since the scheduler can
analyze all the jobs in an array together, instead of
individually.

--- Bryan Bayerdorffer <bryan.bayerdorffer at analog.com> wrote:
> Thanks for the pointer.  I did some of the things suggested by the tuning 
> guide and set the FLUSH_ params to 0, and things appear better now.
> 
> 
> Charu Chaubal wrote:
> > Hi Bryan,
> > 
> > One common approach when you have many jobs on the order of a few 
> > seconds is to increase the number of slots per host.  This way, you can 
> > effectively package several jobs together.  It's true that sometimes you 
> > might have two jobs on the same CPU, but, on average, the hope is, the 
> > jobs will run such that no two jobs overlap for very long.
> > 
> > I assume you've looked at the tuning guide here:
> > 
> > http://gridengine.sunsource.net/project/gridengine/howto/tuning.html
> > 
> > Regards,
> >     Charu
> > 
> > 
> > Bryan Bayerdorffer wrote:
> > 
> >> I don't know if this is related to my earlier unresolved problem 
> >> (http://gridengine.sunsource.net/servlets/BrowseList?listName=users&by=thread&from=1703). 
> >>
> >>
> >> Right now we have a situation in which there are about 3000 pending 
> >> jobs being dispatched to ~50 exec hosts (1 slot each).  The majority 
> >> of these jobs have *extremely* short runtimes---just a few seconds.  
> >> The result is that many hosts are idle for a long time (a minute or 
> >> so) waiting for new jobs to be dispatched.  Users are complaining 
> >> because the total throughput for this job mix is a lot lower than it 
> >> was with LSF.  I'm wondering if the SGEEE scheduler is a bottleneck 
> >> here.  I have the schedule interval set to 10 seconds.  I enabled 
> >> profiling, and it seems that each scheduling run takes about 45 
> >> seconds.  This is on a 450MHz Ultra 60 with local /var/spool/sge, the 
> >> same host that used to run the LSF master.
> >>
> >> Anything I can tune to improve the performance for short jobs?  I've 
> >> thought of packaging several small jobs as one, but that would require 
> >> big changes in the way batch submission is scripted, and it's also 
> >> somewhat difficult to predict the runtime.
> >>
> >> What's "generate and send orders?"
> >>
> >> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE job ticket 
> >> calculation: init: 0.320 s, pass 0: 0.180 s, pass 1: 0.000, pass2: 
> >> 0.000, calc: 0.350 s
> >> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE job ticket 
> >> calculation: init: 0.010 s, pass 0: 0.010 s, pass 1: 0.000, pass2: 
> >> 0.000, calc: 0.000 s
> >> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE update orders: job 
> >> orders: 0.590 s, update orders: 0.030 s
> >> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE pending job ticket 
> >> calculation took 1.500 s
> >> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE active job ticket 
> >> calculation took 0.020 s
> >> Tue Mar 30 17:21:19 2004|schedd|hai7|I|PROF: SGEEE job sorting took 
> >> 0.160 s
> >> Tue Mar 30 17:21:28 2004|schedd|hai7|I|PROF: SGEEE job dispatching 
> >> took 8.430 s
> >> Tue Mar 30 17:21:28 2004|schedd|hai7|I|PROF: scheduled in 10.600 (u 
> >> 10.400 + s 0.000 = 10.400): 8 fast, 0 complex, 2817 orders, 80 H, 267 
> >> Q, 621 QA, 0 J(qw), 53 J(r), 0 J(s), 0 J(h), 0 J(e), 8 J(x), 2812 
> >> J(all) 4 C, 1 ACL, 1 PE, 1 CONF, 116 U, 1 D, 0 PRJ, 1 ST, 0 CKPT, 0 RU
> >> Tue Mar 30 17:22:00 2004|schedd|hai7|I|PROF: generate and send orders 
> >> took: 32.020 s
> >> Tue Mar 30 17:22:01 2004|schedd|hai7|I|PROF: schedd run took: 44.570 s 
> >> (copying
> >> the lists took: 1.400 s)
> >>
> > 
> 
> -- 
>   .. ..-. ..- -.-. .- -. .-. . .- -.. - .... .. ... --. . - .- .-.. .. ..-. .
> Bryan Bayerdorffer       bryan at meatspace.net              bryan at spd.analog.com
>                     (Wit's End Computation Center)           (Analog Devices)
> 
> "This isn't right.  This isn't even wrong."
>                                                                   -- Hans Bethe
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 


=====
--
John Saalwaechter <bababooey182 at yahoo.com>

__________________________________
Do you Yahoo!?
Yahoo! Small Business $15K Web Design Giveaway 
http://promotions.yahoo.com/design_giveaway/

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list