[GE users] Erronous job execution

Andreas Haas Andreas.Haas at Sun.COM
Tue Apr 18 14:30:57 BST 2006


On Sat, 15 Apr 2006, Hairul Ikmal Mohamad Fuzi wrote:

> Hi Andreas,
>
> Thanks for the reply.
>
> Sorry to say that I'm not very clear about this shepherd thingy.
> I would appreciate if somebody can explain to me on 'What is  shepherd
> in SGE terms?' and what does shepherd generally do in SGE?

Well ... a general classification of shepherd is available under
sge_shepherd(8).

>
> Regarding your suggestions,
> 3) job's active directory : is it the directory where the user's put
> their job script or it is somewhere in the spool directory ?

It is located in the spool directory. The active directory is a
directory created by sge_execd(8) for each job/task it starts.
The location of the active directory gets passed to the job via
the SGE_JOB_SPOOL_DIR environment variable described under qsub(1).


> 5) How do I start/What command should I use to start the shepherd
> using user 'root' ?

  You start it simply with

    # $SGE_ROOT/bin/<arch>/sge_shepherd

>
> And..I'm just wondering .. is this a software/config based error or is
> there any possibility that kind kind of error is caused by hardware
> failure?

I wouldn't rule it out, but a HW failure is hard to imagine.
I still recommend you diagnose into the problem as I described
it. Unfortunately I can't give you any better advise.

> Just FYI, I'm using SGE (v6.something) which comes together with Rocks
> 4.1 Linux Cluster Distribution.

   # qconf -help | head -5

gets you the exact Grid Engine version.


Regards,
Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list