[GE users] [OT] Cluster monitoring

emjga matthew.garrett at external.total.com
Thu May 28 15:07:59 BST 2009

    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

We use Ganglia for overall views and Nagios to do specific Checking of Services / Hardware
We find Nagios easy to config by hand and very flexibly.
Basically if you can run a command / script then you can get Nagiosn to run the same thing.
NRPE is your helper on this.


murple <andreas.kuntzagk at mdc-berlin.de> wrote on 28/05/2009 12:52:39:

> Hi,
> I'm in search for a monitoring solution.
> I want to monitor the status of a 100+ node cluster. What I'm mostly
> interested in is the hardware status of the nodes and some attached
> fileservers.
> (Temperature, disk failure etc.)
> What solutions do you use? Here is a list of open source products I'm
> aware of and my impression (mostly by reading the webpages).
> Maybe somebody could comment on them.
> Ganglia: Seems to be intended more for monitoring the load on the cluster
> Nagios: Very powerful, but also complex to setup? Hardware status via
> IPMI possible?
> SunMC: Confusing interface, can monitor hardware in detail
> Hobbit (now Xymon): Easy to setup, have some experience with smaller
> setup, don't know about hardware monitoring
> regards, Andreas
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?
> dsForumId=38&dsMessageId=199403
> To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].

Registered in England and Wales No.811900
Registered Office 33 Cavendish Square, London W1G 0PW
This e-mail and any attachments are intended only for the person or entity
to whom it is addressed and may contain confidential or privileged
information.  If you are not the addressee, any disclosure, reproduction,
copying, distribution, or use of this communication is strictly prohibited.
If you are not the intended recipient or person responsible for delivering
this message to the named addressee, please notify us immediately and delete
this e-mail.
It is the responsibility of the addressee to scan this email and any
attachments for computer viruses or other defects.  The sender does not
accept liability for any loss or damage of any nature, however caused,
which may result directly or indirectly from this email or any file attached.

More information about the gridengine-users mailing list