[GE users] debugging mailer oddness on SGE 6

Chris Dagdigian dag at sonsorol.org
Fri Dec 2 13:04:13 GMT 2005


Restarting SGE had no effect.

Yeah, this is an OS X cluster. I've got 2 systems showing symptoms  
where it appears that SGE is failing to successfully call /usr/bin/ 
mail. There is no output in the mail logs at all and occasionally we  
see SGE errors about mailer time-outs:

> 11/23/2005 13:56:28|execd|xxx|E|mailer had timeout - killing
> 11/23/2005 13:56:28|execd|xxx|E|mailer exited with exit status = 1

All other OS X clusters I have access to are handling interaction  
with /usr/bin/mail just fine.

Yesterday I wrote my own mail-wrapper.pl script that logs ARGV() and  
STDIN to a logfile in /tmp and it confirmed that SGE was certainly  
doing the right thing. The subject, recipient and mail message data  
were all perfectly correct. For some reason (and only on these 2  
systems) it was unable to actually complete the task of handing off  
to the designated mailer (/usr/bin/mail in this case).

Last night I replaced mail-wrapper.pl with a new version that  
invokes /usr/bin/mail itself after logging to /tmp. I'll know today  
if it worked. I suspect it will work, and if not, it will at least  
log more usable debug info.

Anyway SGE is certainly doing the right thing with respect to the  
mailer parameter.  I just don't know why calls to /usr/bin/mail are  
failing silently in the mail.logs. I won't care as much if the mail  
sending wrapper script works though!

-Chris



On Dec 1, 2005, at 11:07 PM, Ron Chen wrote:

> I remember helping someone here on this list who was
> encountering similar problems on MacOSX. But I could not find
> the original mail discussion. (I think that was 2 yrs ago).
>
> You can, however, use this script to log the parameters passed
> to the mailer:
>
> #!/bin/sh
> echo $* >  /tmp/mailer.$$
> /usr/bin/mail $*
> echo "return code $?" >> /tmp/mailer.$$
>
> 1) the version above corrected a few typos, I sent the original
> version in this mail:
>
> http://gridengine.sunsource.net/servlets/ReadMsg? 
> list=users&msgNo=12137
>
>
> 2) The script doesn't log the contents of the email, but I
> believe only the "to address" is the most important in most
> cases.
>
>  -Ron
>
>
> --- Chris Dagdigian <dag at sonsorol.org> wrote:
>> At this point I'm confused enough to hope that restarting the
>> execd
>> daemon is going to work.  If that fails it will be time to
>> write a
>> mailer wrapper that does configurable logging. Anyway its a
>> strange
>> issue, I can't reproduce the problem on any other OS X
>> cluster, even
>> ones with identical SGE and network settings.
>>
>> -chris
>>
>>
>>
>>
> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail:
>> users-help at gridengine.sunsource.net
>>
>>
>
>
>
> 		
> __________________________________
> Start your day with Yahoo! - Make it your home page!
> http://www.yahoo.com/r/hs
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list