[GE users] A good spec for a GridEngine 6.1 QMaster
landman at scalableinformatics.com
Fri Jul 18 14:29:33 BST 2008
[ The following text is in the "windows-1252" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
Neil Baker wrote:
> Hi Grid users,
> I?ve been asked to buy a new server specifically to run QMaster on to
> make it as stable as possible and I was wondering if anyone could
> recommend hardware / operating system combinations.
For simplicity/sanity sake, try to keep the OS on the QMaster the same
as on the execution hosts if possible. One less boundary to deal with.
> We used to run v5 on a RedHat8 OS and it used to lock up and crash
> regularly. Initially I did our v6.1 testing on an old SUN Netra T1 105
That was a feature ... er ... bug ... of RH8
> 440MHz with 512MB RAM, running Solaris 10, but it proved to be too
> slow. I was running it with all the execution hosts writing their logs
> back to that machine over an NFS share, which perhaps is what caused the
> performance problems?
Likely. NFS can be incredibly intensive.
> I then turned an execution host into the QMaster (2x 3GHz Intel Xeon
> CPUs and 4GB or RAM), running openSuse10.0 which is perhaps way more
> power than required. Around this time I was advised on this mailing
We have run large clusters from relatively light qmasters. The issue is
usually one of running out of memory and then network bandwidth.
> list to store the execution host logs locally on each execution host.
> This setup performed without any problems.
> I?ve been told in the past that SUN hardware and the Solaris OS are more
> stable that Intel hardware an Linux, but is this still the case? Does
No, it never really was the case. We have seen as many problems on RISC
based systems as we have on CISC systems. The OSes also all crash. We
have Linux units up for 550+ days, and Solaris boxes that can't deal
with a bonnie++ run and crash hard. Others may have different mixes and
experiences, but the net of it is that all units have bugs, and problems.
As a point of simplification of management, it makes sense to have the
QMaster be the same OS as the execution systems if possible. If
everthing is Linux, stay with the same version of Linux. If everything
is IRIX, Solaris, HPUX, AIX, ... stay with that. It complicates
administration to do otherwise.
What I would suggest is a single socket system, and lots of RAM. RAM is
inexpensive these days, get the ECC flavor. Get a quad core CPU (its
hard to find single core units, dual core and quad core have similar costs).
> the Grid Engine QMaster run any better on Solaris than Linux, or are
> there preferred distros of Linux? Also I?m told that rack mount servers
> are generally more stable than towers and that HP gear is more reliable
> than Dell.
Hmmm... I hear lots of biases being fed to you, and not really much
data. IMO Solaris isn't better than Linux for this, and HP gear and
Dell gear are quite similar (for a number of reasons) in terms of
reliability. Since the same companies build the Dell and HP (and IBM,
and ...) gear, it should be no surprise.
Rack mounts more stable than towers? If the configs are identical, I
think stability issues may be attributable to location, quality of
power/cooling, and the occasional "server tipping" that happens when you
are in a tower ...
> Far too many permutations to test / evaluate so I was wondering if
> anyone has had any success (or horror) stories to help me choose.
Keep your hardware and software consistent. If you buy Dell, then, buy
Dell. If you get Sun, or HP, or others, then get their stuff. I might
suggest that you speak with some folks in the UK HPC vendor scene. We
work with Streamline and they are quite good, and very helpful to their
customers. Lots of GE expertise. Fire me a note if you need a contact
email of a technical person.
> Thanks in advance for any help you may be able to offer.
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 866 888 3112
cell : +1 734 612 4615
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users