[GE users] Job loss notifications?

Reuti reuti at staff.uni-marburg.de
Mon Apr 11 16:06:38 BST 2005


Hi Simon,

what about using the checkpointing interface? Just let all entries empty 
and use it only to reschedule a job by setting "when r" (man 
checkpoint). Then you should get a message for the event. Unfortunately 
the last time I tried it, I found that the rescheduling will not occur, 
until the died node is back online (but this was 5.3p6). Maybe it's 
fixed already - can you check it?

To check the status of a node in general, maybe some tools like Ganglia 
will give you the requested information.

Cheers - Reuti



Vsevolod (Simon) Ilyushchenko wrote:
> Rayson,
> 
> Thanks, but this only sends an email when a job is deleted/rescheduled 
> through the SGE interface. I'm interested in getting a notification if 
> the job dies for reasons outside of SGe or if the cluster node that runs 
> the job dies.
> 
> Simon
> 
> Rayson Ho wrote on 04/07/2005 02:47 PM:
> 
>> For jobs, see if the -m option of qsub is good enough??
>>
>> Rayson
>>
>>
>>
>>> Is there any way to be automatically notified that a job or a node has
>>
>>
>> died?
>>
>>> Thanks,
>>> Simon
>>
>>
>> ---------------------------------------------------------
>> Get your FREE E-mail account at http://www.eseenet.com !
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list