[GE users] Jobs still shown as running after process has died

reuti reuti at staff.uni-marburg.de
Mon Aug 16 13:42:38 BST 2010


Am 16.08.2010 um 11:24 schrieb robhorton:

> (sorry for the delay)
> 
> On Fri, 2010-08-13 at 15:20 +0200, reuti wrote:
> ...
>> 
>> So it looks like the execd was aware of the end of the task, but the info never made it to the qmaster.
>> 
>> When you delete the array tasks which are still shown as running (by supplying the index also in the `qdel` command), do you get some error messages in the messages file of the node?
>> 
>> "received task belongs to job 1234 but this job is not here"
> 
> Yes:
> 
> 08/16/2010 10:19:05|  main|compute-3-24|E|received task belongs to job 36451 but this job is not here
> 08/16/2010 10:19:05|  main|compute-3-24|E|received task belongs to job 36451 but this job is not here
> 08/16/2010 10:19:05|  main|compute-3-24|E|acknowledge for unknown job 36451.11/master
> 08/16/2010 10:19:05|  main|compute-3-24|E|can't find active jobs directory "active_jobs/36451.11" for reaping job 36451
> 08/16/2010 10:19:05|  main|compute-3-24|E|ERROR: unlinking "jobs/00/0003/6451.11": No such file or directory
> 08/16/2010 10:19:05|  main|compute-3-24|E|can not remove file job spool file: jobs/00/0003/6451.11
> 08/16/2010 10:19:05|  main|compute-3-24|E|can't remove directory "active_jobs/36451.11": opendir(active_jobs/36451.11) failed: No such file or directory
> 08/16/2010 10:19:05|  main|compute-3-24|E|ja-task "36451.11" is unknown - reporting it to qmaster
> 
> It does disappear from the queue at this point.

Do you have the spool directory local e.g. in /tmp, and some cron-job is removing it?

-- Reuti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274687

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list