[GE issues] [Issue 2794] New - qmaster kills all running jobs because of job state 65536

jlopez jlopez at cesga.es
Wed Nov 19 17:09:30 GMT 2008


http://gridengine.sunsource.net/issues/show_bug.cgi?id=2794
                 Issue #|2794
                 Summary|qmaster kills all running jobs because of job state 65
                        |536
               Component|gridengine
                 Version|6.1u3
                Platform|All
                     URL|
              OS/Version|Linux
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P2
            Subcomponent|qmaster
             Assigned to|ernst
             Reported by|jlopez






------- Additional comments from jlopez at sunsource.net Wed Nov 19 09:09:28 -0800 2008 -------
The symptoms where that at 14:57 all running jobs suddenly appeared in state
65536 for the qmaster and  and a few seconds after that the qmaster killed them all.

These are the messages that appear in the qmaster logs:

11/13/2008 14:56:57|qmaster|cn142|E|execd cn068.null reports running 
state for job (691876.1/1.cn068) in queue "medium_queue at cn068.null" 
while job is in state 65536

11/13/2008 14:58:07|qmaster|cn142|E|execd at cn035.null reports running job 
(691876.1/1.cn035) in queue "medium_queue at cn035.null" that was not 
supposed to be there - killing

These two messages are repeated for every running job at that given time.

We are using the IA64 binaries under SLES10 SP1

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=89133

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list