[GE users] BDB vs. "classic" spooling performance; your feedback

roland roland.dittel at sun.com
Thu Mar 26 14:32:35 GMT 2009


Hi,

the benchmark used in the test is now available. The tar.gz package 
containing the binaries for sol-sparc64, sol-x86, lx24-amd64, lx24-x86, 
darwin-ppc, darwin-x86 is located at:

http://gridengine.sunsource.net/files/documents/7/196/test_spooling_performance.tar.gz

To use the test it's just necessary to execute the file 
test_spooling_performance.sh. A small README is included describing what 
the test does and what's the expected output.

We are looking forward to your feedback

Roland

andy wrote:
> Hi,
> 
> before you read further: we are looking for feedback of users who use Grid
> Engine with BDB spooling enbled. Your feedback and as many details as
> possible are highly appreciated! THANKS!
> 
> The question if BDB or classic spooling for Grid Engine should be used is an
> much discussed topic among our users. We tested the spooling performance
> under Solaris in our lab. The tests are created with a binary which uses the
> Grid Engine spooling framework and creates, writes, reads and deletes 30,000
> (pseudo) jobs.
> 
> A note on BDB spooling results. Strictly spoken we are comparing apples and
> oranges when comparing the SGE classic spooling performance with BDB
> spooling. While BDB opens the database with the O_DSYNC flag to ensure
> maximum data integrity in case of outages we don't use that flag with SGE
> classic spooling. A quick check has shown that you even would not want to
> wait for the end of the first test when the SGE classic spooling code would
> use the O_DSYNC flag for file operations.
> 
> The key messages from the tests results below are:
> 
>   - severe bottlenecks with "classic" NFS spooling
>   - impressive performance improvements with new NFS server and client
>     systems running Solaris 10 and ZFS
>   - local classic spooling on a UFS filesystem is a no-go option
>     (a critical UFS bug fix last year caused a performance break-in with the
>     classic spooling)
>   - moderate performance improvements with NFSv4 vs. NFSv3
> 
> scenario            classic       berkeleydb
> --------------------------------------------
> local_x4100M2_zfs      7.1s            10.3s
> local_4450_zfs         6.9s            12.0s
> local_4450_2530_zfs    9.7s             7.3s
> nfsv4_4100M2_4450    190.1s             9.5s
> nfsv3_4100M2_4450    236.0s            11.6s (2) (*)
> 
> local_v440_2540_zfs   25.3s            27.2s
> local_v440_ufs        82.2s            45.0s
> local_v215_ufs       211.0s            48.0s (1)
> nfsv3_v215_v440     1102.0s            31.0s (1)
> 
> local_x4100_ufs      283.0s            54.3s (1)
> nfsv3_x4100_v440     459.6s            15.2s (1)
> 
> (1)((2) tests repeated only once/twice, not 5 times
> (*) BDB spooling on NFSv3 not supported for production systems!
> 
> The numbers above are average numbers. The minimum in case of multiple tests
> was up to 15% faster.
> 
> Description of scenarios
> ------------------------
> 
> The first 5 tests have been done in a non production network with very
> little network and system load.
> 
> local_x4100M2_zfs:   Sun X4100M2, 2 x dual core 2.6GHz AMD CPU
>                      Solaris 10 8/07
>                      spooling on internal 146GB 10k RPM SAS disk
>                      ZFS formatted
> 
> local_4450_zfs:      Sun X4450, 2 x dual core 3.0 GHz Intel CPU
>                      Solaris 10 5/08
>                      spooling on internal 146GB 10K RPM SAS disk
>                      ZFS formatted
> 
> local_4450_2530_zfs: Sun X4450 (same HW+OS as above)
>                      spooling on local Sun ST 2530, Raid5 12x146GB 15k RPM
>                      SAS disks
>                      ZFS formatted
> 
> nfsv4_4100M2_4450:   client: Sun X4100M2 (as above), using NFSv4 mount
>                      server: Sun X4450 (as above)
> 
> nfsv3_4100M2_4450:   same as above, using NFSv3 mount
> 
> The following tests have been done in a lab network with moderate to medium
> load on network, file server and client systems:
> 
> local_v440_2540_zfs: Sun v440, 4 x 1.6 GHz Sparc CPU
>                      Solaris 10 1/06
>                      Sun StorageTek 2540 (FC), 12x400GB 10k RPM SAS disk
>                      ZFS formatted
> 
> local_v440_ufs:      same hardware as above
>                      internal 146GB 10k RPM SCSI disk
>                      UFS formatted
> 
> local_v215_ufs:      Sun v215, 2X1.5 GHz Sparc CPU
>                      internal 146GB 10k RPM SCSI disk
> 
> nfsv3_v215_v440:     client: Sun v215 (as above)
>                      server: Sun v440 (as above)
> 
> This test shows the significant difference between UFS and ZFS spooling on
> almost the same hardware (compare with "local_x4100M2_zfs" test):
> 
> local_x4100_ufs:     Sun X4100, 2 x dual core 2.6Ghz AMD CPU
>                      spooling on internal 146GB 10K RPM SAS disk
>                      UFS formatted
> 
> This test demonstrates the impact of a slower file server (v440 vs. x4450),
> somewhat faster storage (15k RPM disks vs. 10k RPM disks in a ST 25xx
> array), and higher system and network load (compare with "nfsv3_4100M2_4450"
> test):
> 
> nfsv3_x4100_v440:    client: Sun X4100 (as above)
>                      server: Sun v440 (as above)
> 
> From test runs of the Grid Engine performance testsuite (also available as
> part of this project) we see that the spooling performance has a heavy
> impact on Grid Engine's overall throughput in clusters with high job rates.
> If you observe or perceive Grid Engine throughput bottlenecks, having a
> closer look at your qmaster server hardware and your spooling setup might be
> a good starting point.
> 
> During the next days we'll make the test script and binaries available for
> download for a couple of platforms that you can start your own tests to
> analyze the impact of spooling on your own network. Since the test do not
> run too long time there is little impact on your production systems.
> 
> Andy
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=141470
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Roland Dittel               Tel: +49 (0)941 3075-275 (x60275)
Software Engineering        Fax: +49 (0)941 3075-222 (x60222)
Sun Microsystems GmbH
Dr.-Leo-Ritter-Str. 7       mailto:roland.dittel at sun.com
D-93049 Regensburg          http://www.sun.com/gridware
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Registered Office / Sitz der Gesellschaft:
   Sun Microsystems GmbH
   Sonnenallee 1
   D-85551 Kirchheim-Heimstetten
   Germany
Commercial register of the Local Court of Munich /
Handelsregistereintrag Amtsgericht Muenchen:
   HRB 161028
Managing Directors / Geschaeftsfuehrer:
   Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Chairman of the Supervisory Board / Vorsitzender des Aufsichtsrates
   Martin Haering

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=144010

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list