[GE users] SGE 6.2-u2_1 qmaster segfault

m0zes adam.tygart at gmail.com
Mon Jun 29 18:35:50 BST 2009


Hello Everyone,

I am wondering if this issue is something unique to my installation,
or if someone else has seen something similar.

I am running SGE 6.2-u2_1 on Gentoo Linux hosts using kernel
2.6.27-r6. I am regularly seeing sge_qmaster segfault. I have been
trying to chase down this issue, but haven't had much luck.

It seems like qmaster segfaults when some jobs exit, almost as if
sge_shepherd is sending back bad responses. I have turned on
debugging, and haven't seen anything out of the ordinary, simply a
segfault at the end. I have tried strace'ing the application, again
just getting a segfault at the end.

Today I have come across the qping application, and I am getting a
"warning" output:

mozes at athena ~ $ qping -info athena.beocat 1000 qmaster 1
06/29/2009 12:29:39:
SIRM version:             0.1
SIRM message id:          1
start time:               06/29/2009 12:05:48 (1246295148)
run time [s]:             1431
messages in read buffer:  0
messages in write buffer: 0
nr. of connected clients: 121
status:                   1
info:                     MAIN: E (1430.58) | signaler000: E (1425.00)
| event_master000: E (0.03) | timer000: E (0.03) | worker000: E (0.14)
| worker001: E (1.38) | listener000: E (0.23) | listener001: E (0.87)
| scheduler000: E (5.03) | WARNING
malloc:                   arena(0) |ordblks(1) | smblks(0) | hblksr(0)
| hblhkd(0) usmblks(0) | fsmblks(0) | uordblks(0) | fordblks(0) |
keepcost(0)
Monitor:                  disabled



More information about the gridengine-users mailing list