[GE users] can't open output file error

Patrice Seyed apseyed at bu.edu
Thu Jul 29 17:51:49 BST 2004


Andy,

No, just a large number of serial, batch jobs submitted at the same time..

-Patrice

On Thu, 29 Jul 2004, Andy Schwierskott wrote:

> Hi,
>
> is is a (tightly integrated) parallel job? There has been in bug which is
> fixed.
>
> > Hi,
> >
> > Using SGE 5.3p5 as part of Rocks 3.1..
> >
> > Sometimes when i submit a mass number of jobs (say 200-300 jobs where 250 or
> > so slots are available), a number of the jobs end up in "Eqw" and the
> > following type message appears in messages on the qmaster:
> >
> > Thu Jul 29 04:18:04 2004|qmaster|linga|W|job 19347.1 failed on host c
> > ompute-1-13.local general  opening output file because: 07/29/2004 04
> > :18:02 [501:5855]: error: can't open output file "/home/apseyed/scrip
> > ts/dd": Is a directory
> >
> > Note that the following parameter is set in the scripts "#$ -cwd" and the jobs
> > are being launched from "/home/apseyed/scrips/dd"
> > I find this message strange and that it only occurs for some of the jobs
> > although the scripts are identical in behavior; could be a result of network
> > throttling but I'm not so sure. Any ideas on the error message specifically or
> > otherwise?
> >
> > Cheers,
> > Patrice
> >
> > P.S. While I'm asking what does this type messsage indicate? (i see it from
> > time to time in the sge logs, although non-fatal curious as to why it appears):
> >
> > Thu Jul 29 04:51:53 2004|qmaster|<hostname>|E|can not remove job spool file:
> > zombies/00/0001/9349
> > Thu Jul 29 04:51:53 2004|qmaster|<hostname>|E|ERROR:
> > unlinking "zombies/00/0001/9350": No such file or directory
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list