[GE users] Sending emails when errors in the submitted jobs?

reuti reuti at staff.uni-marburg.de
Wed Mar 10 18:17:44 GMT 2010


Hi,

Am 10.03.2010 um 17:50 schrieb arvindpetaru:

>  I understand that SGE has options to send emails at job BEGINNING,  
> ENDING, SUSPENDING and at ABORTION.
>
> Does Abortion only means when the job is removed from sge env using  
> "qdel" command or does it also send mails for any errors  
> encountered during job execution.? I understand that job's error  
> file will contain the errors encontered during job/script  
> execution, but I need to send emails even when the job encountered  
> any errors(errors can be simple script errors or any hardware/ 
> network link errors). Can this be done using SGE ?

a failed job in a sense of an error in the script is nevertheless a  
successful job for SGE. But you can of course enforce such behavior  
with tests after each command in the jobscript (which you usually do  
anyway if you want to give the user a proper feedback about the  
encountered problem:

...
JOBSTEP=step1
myapplication1
case $? in
     0) ;;
     1) echo "Dataset not found ($JOBSTEP)"
        kill $$ ;;
     2) echo "Output range off bounds ($JOBSTEP)"
        kill $$ ;;
     3) echo "Illegal selection constraints ($JOBSTEP)"
        kill $$ ;;
     *) echo "Unknown error $? encountered ($JOBSTEP)"
        kill $$ ;;
esac

myapplication2
...

It will kill itself w/o using qdel and the necessity being a submit  
host and create the abortion email.

==

Another approach could be to check the size of the error file in a  
job_epilog. When it's not empty, send an email or also kill itself:

#!/bin/sh
[ -s "$SGE_STDOUT_PATH" ] && mail -s Aborted $USER


> Also, can I send emails for a group of jobs performed rather than  
> individual emails for each job submitted.? Say, I submitted 10 jobs 
> (not array jobs) but need a single email to know the status(start/ 
> end/abort/suspend) of all the 10 jobs submitted.?

Not for now, but you could add this request to this issue:

http://gridengine.sunsource.net/issues/show_bug.cgi?id=2943

-- Reuti


> Thanks!
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=247860
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247872

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list