[GE users] Two failure messages from one task
Kevin.Buckley at ecs.vuw.ac.nz
Mon Jun 15 01:26:38 BST 2009
Firstly, I notice my posts are still not being approved ??
This makes the discussion hard to follow if one is not logged
in, as some of my VUW colleagues will not be.
Oh and, the moderators have not cleared the stuff that is
not part of this topic out either - I have emailed
them direct but no reply as yet (Cc:'d in again here).
As you say, the logic of sending an email message is simple,
however, I was asking how the sge_execd would know, in coming
back up on a machine that been restarted, that it had already
sent an email about a job it considered terminated.
So, once again,
the machine is shutting down, the sge_execd sends an email
what state is the sge_execd's "knowledge" of the job in?
the machine now comes back up, the sge_execd starts up and maybe
(just maybe) sees remnants of the job in the local SGE spool but
also "knows" the job has been terminated so sends another email.
We think there's a "spool" not being cleared, either the mailer
or an SGE one, however we can't see which and our belief is that
if it was the mailer which had already sent a message and not
cleared its spool, it would restart with the view that it simply
needed to send the existing message and not create a new one.
Kevin M. Buckley
School of Engineering and Computer Science
Victoria University of Wellington
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users