[GE users] sheperd error can't stat()

Ruppert dieter_ruppert at siemens.com
Wed Dec 12 17:21:00 GMT 2007


it seems that this symptom is the same which I posted a few days ago,


I thought the culprit may be our checkpointing mechanism, but now I suspect
it is some general problem: in some (infrequent) cases the
sheperd process runs into a permission problem when trying to access the
stdout_path directory on job startup. 

It may be useful to compare the configurations which exhibit this
problem; perhaps there is some common factor involved...

We use Gridengine 6.0u6 on Solaris10/Sparc exclusively, with NIS+ as
name service. Job stdout/stderr goes to a directory on the user's 
workstation, which is NFS-mounted (no external storage, just the 
internal disk of a Blade-1500, usually).

What could we try to work around this? We could direct stdout/stderr to
some local disk on the compute node, but this has the disadvantage of
scattering this output over all compute nodes. 

Can we do something which may prevent this from happening in a prologue 
script? From the output it seems that prologue scripts may be executed 
before the error happens.


To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list