[GE users] [OT] Cluster monitoring

craffi dag at sonsorol.org
Thu May 28 13:05:15 BST 2009

Most clusters I see use Ganglia along with a combination of Nagios or  
BigBrother customized to meet their local monitoring and reporting  
requirements. Once you dig a little into Nagios or BigBrother it is  
not that complicated to write your own probes and custom scripts.

Nagios seems to have the larger mindshare along with a great  
collection of community scripts for monitoring just about anything you  
care about.


On May 28, 2009, at 7:52 AM, murple wrote:

> Hi,
> I'm in search for a monitoring solution.
> I want to monitor the status of a 100+ node cluster. What I'm mostly
> interested in is the hardware status of the nodes and some attached
> fileservers.
> (Temperature, disk failure etc.)
> What solutions do you use? Here is a list of open source products I'm
> aware of and my impression (mostly by reading the webpages).
> Maybe somebody could comment on them.
> Ganglia: Seems to be intended more for monitoring the load on the  
> cluster
> Nagios: Very powerful, but also complex to setup? Hardware status via
> IPMI possible?
> SunMC: Confusing interface, can monitor hardware in detail
> Hobbit (now Xymon): Easy to setup, have some experience with smaller
> setup, don't know about hardware monitoring
> regards, Andreas
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=199403
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list