[GE users] seg fault with SGE 6.2u5 server

fx d.love at liverpool.ac.uk
Tue Dec 28 21:19:01 GMT 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

tvsingh <tvsingh at ucla.edu> writes:

> Hello there,
>
> We have a decent size cluster that execute some 3000 jobs on an average on daily basis. I started looking at this setup closely for last couple of weeks and noticed the following errors in the system?s messages file:
>
> Dec  9 10:27:18 localhost kernel: sge_qmaster[20498]: segfault at 0000000000000080 rip 0000003b01079a30 rsp 00000000483e3988 error 4
> Dec  9 10:28:48 localhost kernel: sge_qmaster[20826]: segfault at 0000000000000080 rip 0000003b01079a30 rsp 00000000484a5988 error 4
> Dec  9 10:52:03 localhost kernel: sge_qmaster[21880]: segfault at 0000000000000080 rip 0000003b01079a30 rsp 00000000486ac988 error 4
> Dec 10 00:55:46 localhost kernel: sge_qmaster[7994]: segfault at 0000000000000080 rip 0000003b01079a30 rsp 00000000484df988 error 4
>
> The server is based on the binaries of SGE6.2u5 and OS is CentOS
> 5.x. Also I noticed many a times the memory usage by q master keeps
> increasing without any visible reason and that leads server to crash.

There's at least one known cause of qmaster SEGVs fixed by the source
you can get from https://arc.liv.ac.uk/trac/SGE, as posted about here
many times.  I'm not aware of specific memory leaks that might be cured
by it, though.

-- 
Dave Love
Advanced Research Computing, Computing Services, University of Liverpool
AKA fx at gnu.org

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=310560

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list