[GE users] [OT] Cluster monitoring

igardais igardais at yahoo.fr
Thu May 28 13:37:54 BST 2009

    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

We are using Ganglia/Nagios to monitor both hardware and software status.
Ganglia is good to provide an overall view of the clusters and Nagios to warn about thresholds or services availability.

They are both easy to configure and run (but Nagios lacks a web-enabled configuration console)


----- Message d'origine ----
De : craffi <dag at sonsorol.org>
? : users at gridengine.sunsource.net
Envoyé le : Jeudi, 28 Mai 2009, 14h05mn 15s
Objet : Re: [GE users] [OT] Cluster monitoring

Most clusters I see use Ganglia along with a combination of Nagios or  
BigBrother customized to meet their local monitoring and reporting  
requirements. Once you dig a little into Nagios or BigBrother it is  
not that complicated to write your own probes and custom scripts.

Nagios seems to have the larger mindshare along with a great  
collection of community scripts for monitoring just about anything you  
care about.


On May 28, 2009, at 7:52 AM, murple wrote:

> Hi,
> I'm in search for a monitoring solution.
> I want to monitor the status of a 100+ node cluster. What I'm mostly
> interested in is the hardware status of the nodes and some attached
> fileservers.
> (Temperature, disk failure etc.)
> What solutions do you use? Here is a list of open source products I'm
> aware of and my impression (mostly by reading the webpages).
> Maybe somebody could comment on them.
> Ganglia: Seems to be intended more for monitoring the load on the  
> cluster
> Nagios: Very powerful, but also complex to setup? Hardware status via
> IPMI possible?
> SunMC: Confusing interface, can monitor hardware in detail
> Hobbit (now Xymon): Easy to setup, have some experience with smaller
> setup, don't know about hardware monitoring
> regards, Andreas
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=199403
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list