[GE users] many "killing job that was not supposed to be there" messages in logs

engel_sanchez engel_sanchez at yahoo.com
Thu May 20 18:10:27 BST 2010


Hello. My qmaster messages file has many messages like the following:

05/20/2010 11:32:14|worker|head|E|execd at node024 reports running job (37267.1/2.node024) in queue "a
ll.q at node024" that was not supposed to be there - killing


This job (37267) was initially running in my node020, but got stuck a while ago. Ever since my node020 gets stuck the same way (jobs remain in dr state after being deleted) and the qmaster logs these error messages. The qmaster had to be restarted twice around that time, so I imagine that something in the spooling db might be corrupted. Any pointers into what to do to debug this further or fix it would be really appreciated. Thanks in advance!

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257999

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list