[GE users] Disappointing initial "qmake" performance benchmarks

Andy Schwierskott andy.schwierskott at sun.com
Mon Aug 2 12:48:20 BST 2004


Hi,

Grid Engine starts a new qrsh instance for every make target. This indeed
may create a significant overhead for short (compile) tasks.

The solution would be to change qmake in a way that an existing connection
is kept open and is reused for new make targets. However this is currently
not one of our (Sun's) high priority targets. The engineer (Joachim) who has
quite some experience with qmake and who may give some advices if someone
wants to implement this change himself is not in the office for the next
weeks, so please don't expect too much of technical comments for our side
until then.

Andy

On Thu, 29 Jul 2004, Chris Dagdigian wrote:

> { My $.02 }
>
> Grid Engine / PBS / Platform LSF and similar products all have a certain 
> amount of overhead to deal with when pushing out tasks to remote nodes. This 
> overhead is most visible when you are trying to do lots of little short 
> tasks.
>
> For the type of workflows I commonly see, people generally do not even send a 
> job out to the grid if its expected runtime is ~30 seconds.
>
> Testing with a job that only needs 35 seconds on a single host is probably 
> not the way to test a distributed resource management system in a way that 
> reflects most people's real world needs.
>
> I suspect you'll see significant advantages if you started testing with more 
> complicated compilation efforts. If you test things that are progressively 
> more involved you'll probably discover for yourself the inflection point at 
> which the overhead of SGE is paid back in speed gains from the distributed 
> compile operation.
>
> -Chris
>
>
> Greg Earle wrote:
>
>> I'm running Grid Engine 6.0 on 6 dual-processor 1 GHz UltraSPARC IIIi
>> SunFire V240 servers running Solaris 9.
>> 
>> As a test, I built tcpdump 3.8.3 on one of the machines, and then
>> built it again on the Grid using "qmake".
>> 
>> With "ssh" and "sshd" as the rsh_command/rsh_daemon, respectively,
>> the connection overhead (not to mention all the SSH banners spewed
>> out) was so bad that the compilation took 1 minute and 1 second.
>> 
>> I went back to the modified NetBSD "rsh"/"rshd" combination, and
>> got it down to 45 seconds initially.
>> 
>> A single (regular) "make" on one of the Grid nodes resulted in a
>> time of 1 minute 16 seconds.
>> 
>> Then, realizing it was a dual-processor machine, I instead ran
>> "gmake -j 2".  That got it down to 35 seconds.
>> 
>> So, I ran it on the Grid and specified "1-12" in the parallel "make"
>> environment:
>> 
>> sunfire240#6:1:646 [/usr/local/src/networking/tcpdump/tcpdump-3.8.3] # \
>> /bin/time qmake -cwd -v PATH -v CC -v CXX -v LM_LICENSE_FILE -pe make 1-12 
>> --
>> 
>> real       30.6
>> user        0.0
>> sys         0.1
>> 
>> 30 seconds vs. 35 seconds isn't much of an improvement!  While I
>> didn't expect a 6-fold speed increase from using the Grid, 15%
>> improvement doesn't seem very good - I was hoping it would be at
>> least twice as fast if not faster.
>> 
>> (The source tree I'm building on is served via NFS from a NetApp
>>  filer, in case it matters - obviously all 6 machines in the Grid
>>  configuration can see the same exact tree.)
>> 
>> Am I doing something obviously wrong, or are my expectations too
>> high for the kind of speedups I should see with things like
>> distributed makes in a loosely-coupled clustering environment
>> like this?
>> 
>> Thanks,
>> 
>>     - Greg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list