[GE users] Job Failure Deletes Local Spool Directory

Reuti reuti at staff.uni-marburg.de
Tue Feb 22 14:00:52 GMT 2005


maybe it wasn't done by SGE: is there a cron job running on the machine 
to clean the /tmp from time to time?

Cheers - Reuti

Dan Gruhn wrote:
> Greetings Everyone,
> I am using Fedora Core 1 to run 6.0u3 and have a strange failure mode.  
> I get an administration email of the following:
> Subject:  	"N1GE 6.0u3: Job-array task 6574.262 failed "
> Job 6574 caused action: Queue "low.q at class05-lx.group-w-inc.com 
> <mailto:low.q at class05-lx.group-w-inc.com>" set to ERROR
>  User        = dgruhn
>  Queue       = low.q at class05-lx.group-w-inc.com 
> <mailto:low.q at class05-lx.group-w-inc.com>
>  Host        = class05-lx.group-w-inc.com
>  Start Time  = <unknown>
>  End Time    = <unknown>
> failed assumedly before job:can't create directory active_jobs/6574.262: 
> No such file or directory
> When I look on the host, I see that the execution daemon is running just 
> fine, but that my local spool directory (/tmp/sgespool in my case) is 
> completely gone without a trace.  There is no /tmp/execd error file or 
> anything.
> These hosts are single processor, Pentium(R) 4 CPU 1.80GHz with 512 MB 
> of RAM.  They are the least capabile in my set of hosts.  The error 
> doesn't happen a lot, but it has happened enough that I'd like to solve 
> it if possible.  Of course, SGE recovers the job and runs it on another 
> host, but that queue is out of action until I shut down the execution 
> daemon and bring it back up.  It then recreates the local spool dir and 
> all is well.
> Has anyone else experienced this or have any idea what may be 
> happening?  That is, what in SGE would delete the entire local spool 
> directory tree but leave the executor running?
> Any help will be greatly appreciated.
> Dan

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list