[GE users] Job Failure Deletes Local Spool Directory
Dan.Gruhn at Group-W-Inc.com
Tue Feb 22 14:21:51 GMT 2005
Interesting idea, but I don't see any cron jobs that do this. As a
test, I have made a file in each /tmp dir to see if that file disappears
when this happens again.
Anyone have any other ideas?
On Tue, 2005-02-22 at 09:00, Reuti wrote:
> maybe it wasn't done by SGE: is there a cron job running on the machine
> to clean the /tmp from time to time?
> Cheers - Reuti
> Dan Gruhn wrote:
> > Greetings Everyone,
> > I am using Fedora Core 1 to run 6.0u3 and have a strange failure mode.
> > I get an administration email of the following:
> > Subject: "N1GE 6.0u3: Job-array task 6574.262 failed "
> > Job 6574 caused action: Queue "low.q at class05-lx.group-w-inc.com
> > <mailto:low.q at class05-lx.group-w-inc.com>" set to ERROR
> > User = dgruhn
> > Queue = low.q at class05-lx.group-w-inc.com
> > <mailto:low.q at class05-lx.group-w-inc.com>
> > Host = class05-lx.group-w-inc.com
> > Start Time = <unknown>
> > End Time = <unknown>
> > failed assumedly before job:can't create directory active_jobs/6574.262:
> > No such file or directory
> > When I look on the host, I see that the execution daemon is running just
> > fine, but that my local spool directory (/tmp/sgespool in my case) is
> > completely gone without a trace. There is no /tmp/execd error file or
> > anything.
> > These hosts are single processor, Pentium(R) 4 CPU 1.80GHz with 512 MB
> > of RAM. They are the least capabile in my set of hosts. The error
> > doesn't happen a lot, but it has happened enough that I'd like to solve
> > it if possible. Of course, SGE recovers the job and runs it on another
> > host, but that queue is out of action until I shut down the execution
> > daemon and bring it back up. It then recreates the local spool dir and
> > all is well.
> > Has anyone else experienced this or have any idea what may be
> > happening? That is, what in SGE would delete the entire local spool
> > directory tree but leave the executor running?
> > Any help will be greatly appreciated.
> > Dan
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users