[GE users] Rq state, but it never cleans up

King, Stefan sking at sepaton.com
Fri Jan 20 16:49:27 GMT 2006


I cannot run a certain job on a two-node cluster.

 

cannot run on host "node0X" until clean up of an previous run has
finished

cannot run on host "node1" until clean up of an previous run has
finished

 

Restarting all daemons has no effect.

qdel of the job removes it, but resubmissions incur the same problem.

 

The two measures in combination have likewise no effect.

 

The job is submitted via Drmaa.

 

Submitting a trivial job, (simple.sh) via qsub works.

 

Submitting a different job via Drmaa works.

 

How can SGE decide that cleanup needs to be done for this job?

 

Is there a manual way to accomplish this "clean up" that SGE seems to
want?

 

Is is likely  my (flat file) database is corrupt?

 

I kind of need to understand what happened, more than I need to get the
job running.

 

 

Any suggestions appreciated, I've been stuck for days.

 

Stefan




More information about the gridengine-users mailing list