[GE users] SGE 6 sending emails

Ron Chen ron_chen_123 at yahoo.com
Wed Jan 5 20:04:20 GMT 2005


First of all, by "failed" I mean "SGE thought that it
failed". So please go back to my questions and
re-answer them.

To build shepherd:

1) get the tarball: sge-V60u1_TAG-src.tar.gz
2) go to source/
3) ./aimk -only-depend
4) scripts/zerodepend
5) ./aimk -no-secure -spool-classic sge_shepherd

I want you to insert some code in
daemons/shepherd/shepherd.c:main()

Search for ESSTATE_NO_EXITSTATUS, and you will see
this:

if (!SGE_STAT("exit_status", &buf) && buf.st_size) {
 /* retrieve first exit status from exit status file
*/
    if (!(fp = fopen("exit_status", "r")) || 
         (fscanf(fp, "%d\n", &return_code)!=1))
           return_code = ESSTATE_NO_EXITSTATUS;

I am interested in why it got inside the inner if
statement, so can you replace the code above with:

if (!SGE_STAT("exit_status", &buf) && buf.st_size) {
 /* retrieve first exit status from exit status file
*/
      if ((fp = fopen("exit_status", "r"))) {

           shepherd_trace("OK: reading exit_status");

           if (fscanf(fp, "%d\n", &return_code) != 1)
{
              shepherd_trace("fscanf failed");
              return_code = ESSTATE_NO_EXITSTATUS;
              system("cp exit_status /tmp");
           } else {
                   shepherd_trace("OK: fscanfing");
                 }
      }
      else {
             return_code = ESSTATE_NO_EXITSTATUS;
             shepherd_trace("fopen exit_status
returned NULL");
           }

Compile a new shepherd, and put it in the bin
directory. And also, you should have a copy of the
"exit_status" file in /tmp if it fails again when you
use this debug shepherd.

 -Ron

--- Gavin Kelman wrote:
> Ron Chen wrote:
> > I need more information in order to find out
> what's
> > wrong.
> > 
> > 1) So host watts is an IRIX machine? Do you have
> other
> > IRIX machines with the same OS and same
> configuration?
> 
> We have another IRIX machine but it's running a
> different
> patchlevel of IRIX 6.5
> 
> > 2) Do all jobs fail when executed on watts? And do
> all
> > users encounter the same problem?
> 
> None of the jobs fail, we just get the emails to
> root.
> 
> > 3) Do non-interactive jobs fail?
> 
> Nope.
> 
> > In order to do that, some minor code changes is
> > required, do you know how to compile SGE from
> source?
> 
> Never done it.
> 
> Cheers,
> Gavin.
> 
> -- 
> Gavin Kelman
> UNIX Administrator
> LION Bioscience Ltd.               Voice:  +44
> (0)1223 224751
> Compass House                      Fax:    +44
> (0)1223 224701
> 80-82 Newmarket Road
> Cambridge CB5 8DZ, UK  Email:
> gavin.kelman at uk.lionbioscience.com
> 
> --------------------------------
> PRIVACY & CONFIDENTIALITY NOTICE
> This e-mail may contain confidential and/or
> privileged information.
> If you are not the intended recipient (or have
> received this e-mail
> in error) please notify the sender immediately and
> destroy this
> e-mail. Any unauthorized copying, disclosure or
> distribution of the
> material in this e-mail is strictly forbidden.
> Any attachment - if any - to this message has been
> checked for
> viruses, but please rely on your own virus checker
> and procedures,
> as we do not bear any responsibility and/or
> liability for damages
> resulting in this regard or connected therewith.
> 
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> 
> 



		
__________________________________ 
Do you Yahoo!? 
Read only the mail you want - Yahoo! Mail SpamGuard. 
http://promotions.yahoo.com/new_mail 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list