[GE issues] [Issue 3194] New - sge_shepherd segfault on OpenSuSE 11.2 (x86_64)

megware stephan.ebelt at megware.com
Thu Nov 26 14:25:38 GMT 2009


http://gridengine.sunsource.net/issues/show_bug.cgi?id=3194
                 Issue #|3194
                 Summary|sge_shepherd segfault on OpenSuSE 11.2 (x86_64)
               Component|gridengine
                 Version|6.2u4
                Platform|PC
                     URL|
              OS/Version|Linux
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P3
            Subcomponent|execution
             Assigned to|pollinger
             Reported by|megware






------- Additional comments from megware at sunsource.net Thu Nov 26 06:25:37 -0800 2009 -------
the sge_shepherd seems to die instantaneously at job startup. When I run qsub (ie. echo "sleep 10"|qsub) and watch the scheduling with qstat
the job will go to  running state ('r') and disappears from the jobs list right thereafter.

The qmaster log always shows a line like this in that moment:
11/26/2009 15:01:38|worker|frontend1|I|removing trigger to terminate job 6.1
11/26/2009 15:01:38|worker|frontend1|W|job 6.1 failed on host node11.service assumedly after job because: job 6.1 died through signal SEGV (11)

checking node11.service reveals this in dmesg:
sge_shepherd[16061]: segfault at 2b9670000000 ip 00002b9673450939 sp 00007fff37f61d40 error 4 in libc-2.10.1.so[2b96733d9000+151000]

(there appears one line per job started)

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=229553

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list