[GE users] Disappointing initial "qmake" performance benchmarks

Chris Dagdigian dag at sonsorol.org
Thu Jul 29 19:35:33 BST 2004

{ My $.02 }

Grid Engine / PBS / Platform LSF and similar products all have a certain 
amount of overhead to deal with when pushing out tasks to remote nodes. 
This overhead is most visible when you are trying to do lots of little 
short tasks.

For the type of workflows I commonly see, people generally do not even 
send a job out to the grid if its expected runtime is ~30 seconds.

Testing with a job that only needs 35 seconds on a single host is 
probably not the way to test a distributed resource management system in 
a way that reflects most people's real world needs.

I suspect you'll see significant advantages if you started testing with 
more complicated compilation efforts. If you test things that are 
progressively more involved you'll probably discover for yourself the 
inflection point at which the overhead of SGE is paid back in speed 
gains from the distributed compile operation.


Greg Earle wrote:

> I'm running Grid Engine 6.0 on 6 dual-processor 1 GHz UltraSPARC IIIi
> SunFire V240 servers running Solaris 9.
> As a test, I built tcpdump 3.8.3 on one of the machines, and then
> built it again on the Grid using "qmake".
> With "ssh" and "sshd" as the rsh_command/rsh_daemon, respectively,
> the connection overhead (not to mention all the SSH banners spewed
> out) was so bad that the compilation took 1 minute and 1 second.
> I went back to the modified NetBSD "rsh"/"rshd" combination, and
> got it down to 45 seconds initially.
> A single (regular) "make" on one of the Grid nodes resulted in a
> time of 1 minute 16 seconds.
> Then, realizing it was a dual-processor machine, I instead ran
> "gmake -j 2".  That got it down to 35 seconds.
> So, I ran it on the Grid and specified "1-12" in the parallel "make"
> environment:
> sunfire240#6:1:646 [/usr/local/src/networking/tcpdump/tcpdump-3.8.3] # \
> /bin/time qmake -cwd -v PATH -v CC -v CXX -v LM_LICENSE_FILE -pe make 
> 1-12 --
> real       30.6
> user        0.0
> sys         0.1
> 30 seconds vs. 35 seconds isn't much of an improvement!  While I
> didn't expect a 6-fold speed increase from using the Grid, 15%
> improvement doesn't seem very good - I was hoping it would be at
> least twice as fast if not faster.
> (The source tree I'm building on is served via NFS from a NetApp
>  filer, in case it matters - obviously all 6 machines in the Grid
>  configuration can see the same exact tree.)
> Am I doing something obviously wrong, or are my expectations too
> high for the kind of speedups I should see with things like
> distributed makes in a loosely-coupled clustering environment
> like this?
> Thanks,
>     - Greg

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list