[GE users] Two failure messages from one task

rayson rayrayson at gmail.com
Fri Jun 12 06:56:44 BST 2009

On 6/11/09, ecs_vuw_kevin <Kevin.Buckley at ecs.vuw.ac.nz> wrote:
> We think that if the local mailer got as far as sending the message out but died before
> clearing the mail spool, it'd still send a copy it with the SAME message ID.
> If SGE didn't get to hear of the mail being sent, because it dies too quickly, it might come back up thinking it needed to send again, hence the new message ID.
> What's the mechanism for SGE determining if a notification has been sent ?

SGE sends the notification email by forking a child that executes the "mailer".

If you look at function sge_send_mail() in daemons/common/mail.c , you
will find that the logic is simple:

1) create a pipe

2) fork a child

3) child executes the mailer

4) parent writes the content of the email to the pipe, and then waits
for the child.

5) if the parent is waken up by an alarm (60 sec), then execd should
write something in the log before it exits.

(I should also carify that execd forks a child at the very beginning
of this function just to handle the tasks above, while the parent
continues to handle other tasks. So yes, there are 2 children created
in this function.)

The 60 sec delay looks like a coincidence, but SGE only executes the
mailer once. So assuming that the mailer only sends one copy of the
email, you should not get multiple copies.


> Kevin M. Buckley
> e-Research Programmer
> School of Engineering and Computer Science
> Victoria University of Wellington
> New Zealand


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list