[GE users] can't open output file error

Andy Schwierskott andy.schwierskott at sun.com
Thu Jul 29 12:54:10 BST 2004


is is a (tightly integrated) parallel job? There has been in bug which is

> Hi,
> Using SGE 5.3p5 as part of Rocks 3.1..
> Sometimes when i submit a mass number of jobs (say 200-300 jobs where 250 or
> so slots are available), a number of the jobs end up in "Eqw" and the
> following type message appears in messages on the qmaster:
> Thu Jul 29 04:18:04 2004|qmaster|linga|W|job 19347.1 failed on host c
> ompute-1-13.local general  opening output file because: 07/29/2004 04
> :18:02 [501:5855]: error: can't open output file "/home/apseyed/scrip
> ts/dd": Is a directory
> Note that the following parameter is set in the scripts "#$ -cwd" and the jobs
> are being launched from "/home/apseyed/scrips/dd"
> I find this message strange and that it only occurs for some of the jobs
> although the scripts are identical in behavior; could be a result of network
> throttling but I'm not so sure. Any ideas on the error message specifically or
> otherwise?
> Cheers,
> Patrice
> P.S. While I'm asking what does this type messsage indicate? (i see it from
> time to time in the sge logs, although non-fatal curious as to why it appears):
> Thu Jul 29 04:51:53 2004|qmaster|<hostname>|E|can not remove job spool file:
> zombies/00/0001/9349
> Thu Jul 29 04:51:53 2004|qmaster|<hostname>|E|ERROR:
> unlinking "zombies/00/0001/9350": No such file or directory

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list