[GE users] Jobs still shown as running after process has died

robhorton r.horton at qmul.ac.uk
Mon Aug 16 10:24:42 BST 2010


(sorry for the delay)

On Fri, 2010-08-13 at 15:20 +0200, reuti wrote:
...
> 
> So it looks like the execd was aware of the end of the task, but the info never made it to the qmaster.
> 
> When you delete the array tasks which are still shown as running (by supplying the index also in the `qdel` command), do you get some error messages in the messages file of the node?
> 
> "received task belongs to job 1234 but this job is not here"

Yes:

08/16/2010 10:19:05|  main|compute-3-24|E|received task belongs to job 36451 but this job is not here
08/16/2010 10:19:05|  main|compute-3-24|E|received task belongs to job 36451 but this job is not here
08/16/2010 10:19:05|  main|compute-3-24|E|acknowledge for unknown job 36451.11/master
08/16/2010 10:19:05|  main|compute-3-24|E|can't find active jobs directory "active_jobs/36451.11" for reaping job 36451
08/16/2010 10:19:05|  main|compute-3-24|E|ERROR: unlinking "jobs/00/0003/6451.11": No such file or directory
08/16/2010 10:19:05|  main|compute-3-24|E|can not remove file job spool file: jobs/00/0003/6451.11
08/16/2010 10:19:05|  main|compute-3-24|E|can't remove directory "active_jobs/36451.11": opendir(active_jobs/36451.11) failed: No such file or directory
08/16/2010 10:19:05|  main|compute-3-24|E|ja-task "36451.11" is unknown - reporting it to qmaster

It does disappear from the queue at this point.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274646

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list