[GE users] jobs fail because sungrid can't create job directory??

Avishai Ish-Shalom avishi at cc.huji.ac.il
Mon May 12 21:32:06 BST 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi all.
i'm having a very strange problem with the execution hosts, and i have 
failed to track it.
the log on qmaster says:

qmaster|spider|W|job 249.1 failed on host 
spider3.spiderweb.fh.huji.ac.il general assumedly before job because: 
can't create directory active_jobs/249.1: No such file or directory

i have double checked permissions, and they seem ok. the strangest thing 
is that only about 50% of jobs fail. the jobs return to queue then run 
normally elsewhere. this problem happens with all my nodes (not 
surprising, they are all exactly the same).

I'm running SLES10SP1, kernel 2.6.16.4 x86_64 smp.

thanks,
avishai

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list