[GE issues] [Issue 3194] sge_shepherd segfault on OpenSuSE 11.2 (x86_64)

megware stephan.ebelt at megware.com
Fri Nov 27 10:56:47 GMT 2009


http://gridengine.sunsource.net/issues/show_bug.cgi?id=3194



User megware changed the following:

                What    |Old value                 |New value
================================================================================
                  Status|RESOLVED                  |REOPENED
--------------------------------------------------------------------------------
              Resolution|WORKSFORME                |
--------------------------------------------------------------------------------




------- Additional comments from megware at sunsource.net Fri Nov 27 02:56:46 -0800 2009 -------
I modified /etc/init.d/sgeexecd to set MALLOC_CHECK_ right before sgeexecd is invoked and tried with two approaches:

[...]
      export MALLOC_CHECK_=0
      $bin_dir/sge_execd
[...]

and

[...]
      MALLOC_CHECK_=0 $bin_dir/sge_execd
[...]

neither of which caused a difference in behaviour, ie. the job still dies and there is SIGSEGV reported in qmaster log and a segfault line
in dmesg on the node.

Interestingly I so far failed to reproduce this on a virtual machine installation here. The main difference to the cluster is that there's a
more complex setup involving NIS and automount for user homes (I can't really simulate this). Could that be related?

stephan

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=229751

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list