[GE users] Job does not exist

Jonathan Hunt jjh at 42quarks.com
Mon Sep 1 15:18:39 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

I am using SGE 6.2 running on OS X 10.5.4 Server. Shares are NFS automaps.

I am getting the following errors in the log files from my node exec
processes and so jobs never complete:

09/01/2008 23:46:30|  main|qbi-xgrid-02|E|shepherd of job 53.1 exited
with exit status = 11
09/01/2008 23:46:30|  main|qbi-xgrid-02|W|reaping job "53" ptf
complains: Job does not exist

However, I can temporarily fix this by ssh ing into a particular node
and restarting the process
sudo killall sge_execd
sudo /sge/default/common/sgeexecd
BUT this fix lasts only while the ssh sessions stays logged in to that
node. Quite reproducibly logging out with ssh sees the node revert
back to failing to find the jobs.

Any idea of what is going on and how to fix it? Any help appreciated.

Thanks,
Jonny


-- 
Jonathan J Hunt <jjh at 42quarks.com>
Homepage: http://www.42quarks.net.nz/wiki/JJH
(Further contact details there)
"Physics isn't the most important thing. Love is." Richard Feynman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list