[GE users] A good spec for a GridEngine 6.1 QMaster
neil.baker at crl.toshiba.co.uk
Fri Jul 18 14:10:32 BST 2008
Hi Grid users,
I've been asked to buy a new server specifically to run QMaster on to make
it as stable as possible and I was wondering if anyone could recommend
hardware / operating system combinations.
We've recently decided to migrate to Grid Engine v6.1 and our v6.1 tests so
far show that it's a lot more stable than the v5 versions we've been using
up until now. We have approximately 60 execution hosts, each with 2x 3GHz
Intel Xeon CPUs and 4GB or RAM. We tend to run 4 jobs on an execution host
at once resulting in 4 x 60 nodes = 240 nodes. We are looking to expand
this to approximately 75 execution hosts in the very near future. 300 nodes.
Most jobs last hours, but we have a few jobs that last only 15 minutes.
Also the grid can remain empty for days without being used, but at other
times the grid can be maxed out with a queue of perhaps 1000 jobs or more.
We used to run v5 on a RedHat8 OS and it used to lock up and crash
regularly. Initially I did our v6.1 testing on an old SUN Netra T1 105
440MHz with 512MB RAM, running Solaris 10, but it proved to be too slow. I
was running it with all the execution hosts writing their logs back to that
machine over an NFS share, which perhaps is what caused the performance
I then turned an execution host into the QMaster (2x 3GHz Intel Xeon CPUs
and 4GB or RAM), running openSuse10.0 which is perhaps way more power than
required. Around this time I was advised on this mailing list to store the
execution host logs locally on each execution host. This setup performed
without any problems.
I've been told in the past that SUN hardware and the Solaris OS are more
stable that Intel hardware an Linux, but is this still the case? Does the
Grid Engine QMaster run any better on Solaris than Linux, or are there
preferred distros of Linux? Also I'm told that rack mount servers are
generally more stable than towers and that HP gear is more reliable than
Far too many permutations to test / evaluate so I was wondering if anyone
has had any success (or horror) stories to help me choose.
Thanks in advance for any help you may be able to offer.
More information about the gridengine-users