[GE users] debugging mailer oddness on SGE 6

DGS dgs at gs.washington.edu
Wed Nov 30 20:59:29 GMT 2005


> 
> I've got a vanilla 6.0u3 SGE system with the standard mailer  
> configured as "/usr/bin/mail". Under the hood on each compute node is  
> a postfix MTA with a relayhost parameter that relays SMTP traffic  
> along to the qmaster node for transit to the public network.
> 
> Running  /usr/bin/mail works as expected via the command line just  
> fine on each compute node. Local logs show the connection and  
> successful relay.

Did you actually set the '/usr/bin/mail' path in your cluster
configuration?  I believe the mailer path is '/bin/mail' in
the default installation.

> 
> But currently when SGE email notification is requested, the jobs run  
> and there is no indication that any sort of mail delivery attempt was  
> made at all. The smtp logs on each compute node show no connection/ 
> delivery attempts whatsoever.
> 
> The only interesting thing in the logs is an older message from a few  
> days ago:
> 
> >11/23/2005 13:56:28|execd|xxx|E|mailer had timeout - killing
> >11/23/2005 13:56:28|execd|xxx|E|mailer exited with exit status = 1
> 
> Long shot but ...
> 
> I'm wondering if a mailer error in SGE is similar to a queue that  
> drops into state "E" in that it persists until "something" is done.  
> If SGE encounters a fatal mailer error in the past, will it stop  
> trying to send email for future jobs? Do I need to restart the execd  
> daemons on a compute node or something?

I doubt it.  I use 'sSMTP', a very simple and dumb mailer, on my cluster
nodes.  All it does is relay mail to a hub.  If the hub is down, it simply
fails and drops the message.  I've had the mail hub go down once or twice,
and stopped getting notifications from SGE.  But just restarting the mail
hub was sufficient to get things working again.

David S.
 

> 
> If that is not the case I think my next step is going involve writing  
> a custom mailer script that can do some verbose logging. Does anyone  
> have a simple mailer script or wrapper that I can use for this  
> purpose? All I want to do is drop a custom mailer script in place  
> that is capable of logging the fact that it has actually been  
> invoked...then it can just pass along its data to /usr/bin/mail as  
> expected.
> 
> Regards,
> Chris
> 
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list