[GE users] debugging mailer oddness on SGE 6

Chris Dagdigian dag at sonsorol.org
Wed Nov 30 21:16:36 GMT 2005

Thanks Reuti --

On Nov 30, 2005, at 3:59 PM, Reuti wrote:
>> The only interesting thing in the logs is an older message from a  
>> few days ago:
>>> 11/23/2005 13:56:28|execd|xxx|E|mailer had timeout - killing
>>> 11/23/2005 13:56:28|execd|xxx|E|mailer exited with exit status = 1
>> Long shot but ...
> Nothing in /var/log/mail besides your commandlines tests? When I  
> configured postfix, I had to stop/start postfix after the changes,  
> as a reload wasn't enough to get all changes accepted by postfix.  
> But I never had the issue that SGE gave up to use it (u4 & u6).

"postfix reload" worked fine for me on an idle testbed cluster - I  
instantly saw SGE connection attempts in the postfix logs after  
adding the relayhost parameter.

The odd thing I am seeing on this cluster is *zero* mail log  
indications that SGE made any sort of delivery attempt all. The local  
node mail logs just don't change at all when SGE jobs are running.  
This is why I was sort of hoping that the previous "|E|" error may  
have caused it to stop attempting delivery.

I need to wait for a cluster node to drain and then we'll restart  
postfix and SGE on an exec host and see what happens. Thanks!


To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list