[GE users] Job does not exist

Reuti reuti at Staff.Uni-Marburg.DE
Mon Sep 1 17:17:32 BST 2008


Hi,

Am 01.09.2008 um 16:18 schrieb Jonathan Hunt:

> Hi,
>
> I am using SGE 6.2 running on OS X 10.5.4 Server. Shares are NFS  
> automaps.
>
> I am getting the following errors in the log files from my node exec
> processes and so jobs never complete:
>
> 09/01/2008 23:46:30|  main|qbi-xgrid-02|E|shepherd of job 53.1 exited
> with exit status = 11
> 09/01/2008 23:46:30|  main|qbi-xgrid-02|W|reaping job "53" ptf
> complains: Job does not exist
>
> However, I can temporarily fix this by ssh ing into a particular node
> and restarting the process
> sudo killall sge_execd
> sudo /sge/default/common/sgeexecd
> BUT this fix lasts only while the ssh sessions stays logged in to that
> node. Quite reproducibly logging out with ssh sees the node revert
> back to failing to find the jobs.
>
> Any idea of what is going on and how to fix it? Any help appreciated.

do you have the spool directory of the nodes local or also on NFS?

-- Reuti


> Thanks,
> Jonny
>
>
> -- 
> Jonathan J Hunt <jjh at 42quarks.com>
> Homepage: http://www.42quarks.net.nz/wiki/JJH
> (Further contact details there)
> "Physics isn't the most important thing. Love is." Richard Feynman
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list