[GE users] many "killing job that was not supposed to be there" messages in logs

engel_sanchez engel_sanchez at yahoo.com
Wed May 26 21:11:33 BST 2010


There weren't any old entries in the jobs directory for the nodes involved. In the end, things seem to have gone back to normal mostly after restarting the execd daemons in the suspicious nodes. However, I can still see the sge_shepherd for the old phantom job with ps (the shepherd processes for the good jobs running in the node at the moment did all go away).  I'm afraid to kill it, as when I did that in the first node where the problem manifested, the node and qmaster went into panic, spewing lots of errors to the logs and I can't remember what I did to stop it.

Thanks for replying. I'll limp along and pray there's nothing too terrible lurking in my cluster.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=258768

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list