[GE users] failed job emails configuration

wagoodman wgoodman at jcvi.org
Wed Apr 21 19:26:58 BST 2010


Thanks for the input... Where would I find the RFE? below is a snippet
of thousands,
of emails I receive when a job fails. This is an issue that happens
maybe once a month:

-----------------------------------snippet------------------------------
---------------
Job 4888233 caused action: Job 4888233 set to ERROR
 User        = amoustaf
 Queue       = fast.q at dell-3-3-1.jcvi.org
 Start Time  = <unknown>
 End Time    = <unknown>
failed opening input/output file:04/20/2010 16:07:53 [2846:17364]:
error: can't open output file "/local/ifs_projects/GOSII/ahmed/phy
Shepherd trace:
04/20/2010 16:07:53 [1132:17363]: shepherd called with uid = 0, euid =
1132 04/20/2010 16:07:53 [1132:17363]: csp = 0 04/20/2010 16:07:53
[1132:17363]: starting up 6.2u3 04/20/2010 16:07:53 [1132:17363]:
setpgid(17363, 17363) returned 0 04/20/2010 16:07:53 [1132:17364]:
Child: Starting son(prolog,
sgeworker@/usr/local/sge_current/jcvi-scripts/prolog dell-3-3-1.jcvi.org
amoustaf 4888233 targetp fast.q, 0);

I have created a folder in MS Outlook but that not even a band aide  ...
Above is one of approximately 35,000
The mail header always read " SGE 6.2u3: Job 4888233 failed ". The
version of SGE, JobID# and failed, how could I write a wrapper?
BTW we use MS Exchange any ideas?

Thanks 

Bill

-----Original Message-----
From: reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Wednesday, April 21, 2010 1:09 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] failed job emails configuration

Am 21.04.2010 um 18:52 schrieb wagoodman:

> This is the problem I'm having. I have my SGE set up to send email
alerts to sgealerts (which is a mailing list that me and another person
belongs to)so when jobs fail I get notified. However this can be a
double edge sword, when users submits array jobs (30 to 50,000) this
brings MS outlook to it's knees, sometimes rendering my PC helpless. Is
there a configuration to set to send just one email if a batch or array
job fails, Please help the spam is killing me.

There is already an RFE for it. Do all tasks fail if any fails? It could
be put into a mail-wrapper, but needs some persistent information of the
job context to be stored, as the mail-wrapper has no access to the job's
entries any longer (or send an email by hand inside the job script if
the error could be trapped, but then only if $SGE_TASK_LAST equals
$SGE_TASK_ID for the actual job).

Would this be feasible?

-- Reuti


> Bill
> 
> ------------------------------------------------------
>
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessage
Id=254359
> 
> To unsubscribe from this discussion, e-mail:
[users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessage
Id=254360

To unsubscribe from this discussion, e-mail:
[users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=254365

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list