[GE users] shepherd problem with local spool directory

Andy Schwierskott andy.schwierskott at sun.com
Wed Jun 23 09:51:23 BST 2004


Robert,

is it SGE 5.3p? or 6.0?

Please try the following (to find out whethere the job script really hasn't
been written - which I don't think is the case because then there should be
a error message in the execd "messages" files).

Replace the shepherd binary with a script:

   #!/bin/sh
   sleep 3600

And once the job has been started please cross check whether the job script
in

  /var/spool/sge/theorie/boe/job_scripts/10

really doesn't exist.

Another reason for this problem might be insufficient access priveleges on
the "upper" directory for the job user.

Do *all* directories in the path

  /var/spool/sge/theorie/boe/job_scripts

have "r-x" permissions for "other"?

Andy


> Hi,
>
> I have a problem which probably has to do with the use of local spool
> directories. (For testing reasons I am using just one host and a simple
> default queue).
>
> I can submit jobs, but when they are passed to a matching queue, they
> can not be executed, because the job script has not been copied to the
> local spool directory "job_scripts". This directory is empty. I don't
> think that the problem is related to permissions being set wrongly, as
> the execd can happily write to the local spool directory.
>
> The execd message says:
>
> Tue Jun 22 22:49:00 2004|execd|boe|E|shepherd of job 10.1 exited with exit status = 11
>
> The qmaster message says:
>
> Tue Jun 22 22:49:00 2004|qmaster|boe|W|job 10.1 failed on host
> boe.severin.local general before job because: 06/22/2004 22:48:58
> [1000:3427]: unable to find job file
> "/var/spool/sge/theorie/boe/job_scripts/10"
> Tue Jun 22 22:49:00 2004|qmaster|boe|W|rescheduling job 10.1
> Tue Jun 22 22:49:00 2004|qmaster|boe|E|queue q4 marked QERROR as result of job 10's failure
>
> The output of the debug-email can be found at the reminder of this
> mail.
>
> Thanks for any help,
> Robert.
>
>


Regards,
Mit freundlichen Gruessen,
Andy
Schwierskott

--
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Andy Schwierskott           Tel:     +49 941 3075-200  (x60200)
N1 Grid Engine Engineering  Support: +49 941 3075-250  (x60250)
Sun Microsystems GmbH       Fax:     +49 941 3075-222  (x60222)
Dr.-Leo-Ritter-Str. 7       mailto:andy.schwierskott at sun.com
D-93049 Regensburg          http://www.sun.com/gridware

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list