[GE users] seg fault with SGE 6.2u5 server
d.love at liverpool.ac.uk
Tue Dec 28 21:19:01 GMT 2010
[ The following text is in the "utf-8" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some characters may be displayed incorrectly. ]
tvsingh <tvsingh at ucla.edu> writes:
> Hello there,
> We have a decent size cluster that execute some 3000 jobs on an average on daily basis. I started looking at this setup closely for last couple of weeks and noticed the following errors in the system?s messages file:
> Dec 9 10:27:18 localhost kernel: sge_qmaster: segfault at 0000000000000080 rip 0000003b01079a30 rsp 00000000483e3988 error 4
> Dec 9 10:28:48 localhost kernel: sge_qmaster: segfault at 0000000000000080 rip 0000003b01079a30 rsp 00000000484a5988 error 4
> Dec 9 10:52:03 localhost kernel: sge_qmaster: segfault at 0000000000000080 rip 0000003b01079a30 rsp 00000000486ac988 error 4
> Dec 10 00:55:46 localhost kernel: sge_qmaster: segfault at 0000000000000080 rip 0000003b01079a30 rsp 00000000484df988 error 4
> The server is based on the binaries of SGE6.2u5 and OS is CentOS
> 5.x. Also I noticed many a times the memory usage by q master keeps
> increasing without any visible reason and that leads server to crash.
There's at least one known cause of qmaster SEGVs fixed by the source
you can get from https://arc.liv.ac.uk/trac/SGE, as posted about here
many times. I'm not aware of specific memory leaks that might be cured
by it, though.
Advanced Research Computing, Computing Services, University of Liverpool
AKA fx at gnu.org
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users