[GE users] sge_qmaster 6.2u5 daemon: repeating segfaults

fx d.love at liverpool.ac.uk
Thu Mar 25 11:17:19 GMT 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

ah_sunsource <ahaupt at ifh.de> writes:

> Hi *,
>
> yesterday afternoon our SGE master started segfaulting again and again
> "out of the blue". No changes to the configuration have been done for
> weeks... Is there anyone else who has already seen this (output of
> dmesg)? :

I don't think the precise output is relevant but, yes, see recent
postings here from me and others, and issue #3251.  You're lucky if it's
stopped -- it hasn't here.  I suspect it's some particular sort of job
that was in the system at some stage with you and is in ours all the
time now, but I don't have any good guesses about what sort.  Currently
I'm just running qmaster under monit with a short check time, as
fortunately we don't have a high throughput.

I'd be very grateful for any debugging hints from developers on how to
debug the corrupt list entries.  I've found it difficult to get to grips
with the code base in the time I've had to look at it so far.  From
experience with similar things, I suspect it's about a week's work to
get to the bottom of it, and I don't have that time.  (I can supply core
dumps and/or a binary compiled `-g -O0' on RedHat 5 if anyone else with
the problem would like to take a look.)

-- 
(Dr) Dave Love
?E-Science?, Computing Services Department, University of Liverpool
AKA fx at gnu.org

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=251318

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list