Opened 10 years ago

Last modified 10 years ago

#1317 new defect

Event for "unheard": send email

Reported by: Reuti Owned by:
Priority: normal Milestone:
Component: sge Version: 6.2u5
Severity: minor Keywords: monitoring


When an exechost goes into "unheard" state, an email should be send to the administrator who is configured in SGE. It can be done in a cron job checking qhost or qstat -f for sure, but as the event took already place in SGE, why not also send an email like it's already done for crahsed jobs.

Change History (1)

comment:1 Changed 10 years ago by dlove

  • Keywords monitoring added

This should batch checks on all hosts to avoid mail storms in instances like
the file server for a stateless cluster going down.

#1322 is related.

Note: See TracTickets for help on using tickets.