[GE users] Spooling benchmarks

jlb jlb at salilab.org
Sat Apr 10 00:07:37 BST 2010


A while ago I promised some benchmarks comparing classic vs. BDB spooling 
over NFS on a decent sized cluster.  Having been sidetracked by several 
other things in the interim, I've only just finished them.  I hope folks 
still might find them useful.

Setup: SGE 6.2u4 courtesy binaries.  Tests were done using both classic
        spooling over NFSv3 and BDB spooling over NFSv4.  All machines are
        running up-to-date CentOS-5.
qmaster: HP Proliant DL160 G6, 2x Xeon E5520, 24GB RAM, single 250GB SATA
          drive
NFS server: HP Proliant DL580 G5, 4x Xeon X7350, 28GB RAM, 4x 15K RPM SAS
             drives setup as a RAID10
cluster: 570 nodes of various speeds offering 3134 slots

Note that while the qmaster and NFS server were dedicated to this 
benchmarking, the nodes themselves were still in use in our production 
cluster.  Every test was run 3 times and the time reported is the average 
of those runs.

Test 1:
Submit a single array job with the specified number of tasks.  Each task 
simply runs /bin/true.
Results:
http://www.duke.edu/~jlb17/array.pdf

Test 2:
Submit the specified number of jobs, with each job simply running 
/bin/true.  Measure the amount of time it takes from the first submission 
until the last job runs.  This means that jobs are being run while more 
are being submitted.
Results:
http://www.duke.edu/~jlb17/subrun.pdf

Test 3:
Submit the specified number of jobs, placing a hold on each job.  Release 
the hold, and measure the amount of time it takes for all the jobs to 
execute.
Results:
http://www.duke.edu/~jlb17/run.pdf

A few observations:
  o Load on the NFS server was generally low.  It was never over 2 during
    tests with classic spooling and generally around 0.5 during tests with
    BDB spooling.

  o Test 3 was hardest on the qmaster, with high loads and "failed gdi
    response"s during the higher job count tests.  This was true for both
    classic and BDB spooling.

  o I find the total throughput to be pretty low given the number of
    slots involved -- should it be higher?  Any hints as to where to look
    to improve things?

I still have the test configuration setup (currently setup for BDB over 
NFSv4), and I'm happy to try any suggestions folks have.

-- 
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=252860

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list