[GE users] Two failure messages from one task

ecs_vuw_kevin Kevin.Buckley at ecs.vuw.ac.nz
Fri Jun 12 05:43:19 BST 2009


> I haven't dug into the code yet, but by doing a simple diff, ...

Apologies, I should have pasted in my simple diff and saved you the effort !

Both emails are the same - I have multiple copies of such behaviour.

Sadly, for values of sadness in diagnosing the problem, I have been advising folk to submit with the rerun flag, so we may never see the behaviour again !

However,

what we have since discovered is that we only seem to get TWO messages when there is a controlled shutdown.

If some oik just turns the public access box off then we just see the one message sent when things come back up.

Taking a recent example, the message we received from before the shutdown, presumably when the SGE knows it's on its way out, isn't in the maillog but the message sent by something as the box comes back up is.

We think that if the local mailer got as far as sending the message out but died before
clearing the mail spool, it'd still send a copy it with the SAME message ID.

If SGE didn't get to hear of the mail being sent, because it dies too quickly, it might come back up thinking it needed to send again, hence the new message ID.

What's the mechanism for SGE determining if a notification has been sent ?


Kevin M. Buckley

e-Research Programmer
School of Engineering and Computer Science
Victoria University of Wellington
New Zealand

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=201619

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list