[GE users] error writing to file "job_scripts/25024" : No such file or directory

reuti reuti at staff.uni-marburg.de
Tue Feb 9 10:07:45 GMT 2010


Am 09.02.2010 um 01:58 schrieb danielgoolsby:

> Sure enough there was a 'tmpwatch' cron in /etc/cron.daily that was  
> deleting the execd.pid and one of the directories..
>
> I re-ran inst-sge -x and corrected it.  I couldn't find a config  
> file that had the tmp directory in it.. where is that information  
> stored?

Either in SGE's configuration or the configuration of the local  
machine if it exists (`qconf -mconf` respectively `qconf -mconf  
<exechost>`). Entry "execd_spool_dir".

-- Reuti


>
> Daniel
>
> On Sun, Feb 7, 2010 at 4:49 AM, reuti <reuti at staff.uni-marburg.de>  
> wrote:
> Hi,
>
> Am 05.02.2010 um 21:03 schrieb danielgoolsby:
>
> > I have about a 50 server (~400 slot) implementation, and I seem  
> to be
> > getting this error more often.
> >
> > A user would submit a job, the job would be in the queue, but for  
> some
> > reason error'ing out a few nodes in the process-- before finally  
> being
> > able to find a host that it can start.
> >
> > I then have to go in and clear the queue of errors where other  
> people
> > can submit jobs to the queue (with a 'qmod -cq queuename.q').
> >
> > If I do 'qstat -j <job  #>.. I'll get these error reasons:
> >
> > error reason    1:          error writing to file "job_scripts/ 
> 25024":
> > No such file or directory
> >                 1:          error writing to file "job_scripts/ 
> 25024":
> > No such file or directory
> >                 1:          error writing to file "job_scripts/ 
> 25024":
> > No such file or directory
> >                 1:          error writing to file "job_scripts/ 
> 25024":
> > No such file or directory
> >                 1:          error writing to file "job_scripts/ 
> 25024":
> > No such file or directory
> > scheduling info:            queue instance "big.q at node1" dropped
> > because
> > it is disabled
> >                             queue instance "big.q at node2" dropped
> > because
> > it is disabled
> >
> > etc...
> >
> > But the job finds a host and starts to run.  I've been getting these
> > more often, but haven't figured out why.
> >
> > If I 'ls -l' on the execd_spool_dir I get something that looks like
> > this:
> >
> > [root at node1 ~]# ls -l /tmp/gridengine/node1/
>
> in many Linux distributions a cron job is removing outdated files and
> directories from /tmp by default. Can you adjust your setup to use
> some directory like /var/spoo/sge for the local spool files of SGE?
>
> -- Reuti
>
>
> > total 20
> > drwxr-xr-x 3 root root 4096 Feb  5 10:12 active_jobs
> > -rw-r--r-- 1 root root    5 Feb  3 14:34 execd.pid
> > drwxr-xr-x 3 root root 4096 Feb  5 10:12 jobs
> > drwxr-xr-x 2 root root 4096 Feb  5 10:12 job_scripts
> > -rw-r--r-- 1 root root 2228 Feb  3 14:34 messages
> >
> > Whereas on a 'broken' host, I get this:
> >
> > [root at node3 cab103]# ls -l
> > total 16
> > drwxr-xr-x 2 root root 4096 Feb  5 10:12 active_jobs
> > drwxr-xr-x 2 root root 4096 Feb  5 10:12 jobs
> > -rw-r--r-- 1 root root 4394 Feb  3 15:30 messages
> >
> > Anyone have any knowledge as to why the execd.pid or the job_scripts
> > directory would delete?  I can understand the job_scripts dir
> > deleting,
> > but not the execd.pid..
> >
> > Or I could be looking at the wrong information.. who knows..
> >
> > Can anyone help?
> >
> > Daniel
> >
> > ------------------------------------------------------
> > http://gridengine.sunsource.net/ds/viewMessage.do?
> > dsForumId=38&dsMessageId=243550
> >
> > To unsubscribe from this discussion, e-mail: [users-
> > unsubscribe at gridengine.sunsource.net].
> >
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=243810
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>
>
>
> -- 
> --daniel
> --

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=244063

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list