[GE users] sge_qmaster 6.2u5 daemon: repeating segfaults

m0zes adam.tygart at gmail.com
Fri Apr 16 14:02:09 BST 2010

    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

I believe I have seen these segfaults recently. In my case it seems
that some finished jobs had not been cleaned up in the exec host spool
directories. Once I cleaned up the old jobs the qmaster was stable

I have no idea whether or not this is the exact problem everyone else
has been experiencing, but I figured this might be a starting point.

Next time I see it, I'll try and grab some debug traces.


On Fri, Apr 16, 2010 at 06:16, fx <d.love at liverpool.ac.uk> wrote:
> dom <marco.donauer at sun.com> writes:
>> Currently I have no hint where and how I could step into this problem.
> I think it has to be done systematically -- figuring out how the list
> structures become invalid.  If that's not clear from the core dump, we
> need to instrument the program somehow to try to find where it happens.
> From experience I'm just surprised it's a new sort of bug that's not
> familiar to the developers, so I guess it's due to recent architectural
> changes.
> I'm willing to do a reasonable amount of work on this, or provide access
> to our cluster, though I'm not sure whether that will help, as we can't
> keep a debugging session open for long on a crashed qmaster.
> --
> Dave Love
> ?E-Science?, Computing Services Department, University of Liverpool
> AKA fx at gnu.org
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253653
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list