[GE users] Erronous job execution

John Saalwaechter johnsaalwaechter at yahoo.com
Tue Apr 18 14:54:42 BST 2006

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I just encountered the same error in SGE, and the root cause
was somewhat obscure.  Perhaps you have the same issue.  For me
it was the fact that a few nodes in a cluster inadvertently
used NFS soft mounts instead of hard mounts.  This included
mounting $SGE_ROOT as a soft mount.

Depending on your OS, you could check the output of "mount" or
grep through /proc/mounts to see if you have soft-mounted NFS

The problem with soft mounts in a cluster is that after the
timeout period, soft mounts return I/O errors to the
application.  If you create a bottleneck on the bandwidth
to the $SGE_ROOT NFS mount and it's soft-mounted, SGE will
get various I/O failures.

Hope this helps.

By the way, we mount everything both hard and interruptible
(i.e. "hard,intr").

On Sat, 15 Apr 2006, Hairul Ikmal Mohamad Fuzi wrote:
>Hi Andreas,
>Thanks for the reply.
>Sorry to say that I'm not very clear about this shepherd thingy.
>I would appreciate if somebody can explain to me on 'What is  shepherd
>in SGE terms?' and what does shepherd generally do in SGE?
>Regarding your suggestions,
>3) job's active directory : is it the directory where the user's put
>their job script or it is somewhere in the spool directory ?
>5) How do I start/What command should I use to start the shepherd
>using user 'root' ?
>And..I'm just wondering .. is this a software/config based error or is
>there any possibility that kind kind of error is caused by hardware
>Just FYI, I'm using SGE (v6.something) which comes together with Rocks
>4.1 Linux Cluster Distribution.
>Thanks again!
>On 4/12/06, Andreas Haas <Andreas.Haas at sun.com> wrote:
>> Hi Ikmal,
>> it tells you shepherd "failed before writing exit_status".
>> This could mean there was an error condition shepherd could
>> not handle. From shepherd's trace file output I can't assess
>> what might have caused this.

johnsaalwaechter at yahoo.com

Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list