[GE users] 6.0 seg fault in qmon on jobs button

Don Shesnicky dshesnicky at enqsemi.com
Tue Jul 13 21:58:37 BST 2004


 
I have what I think is a good compile of the source from cvs on the
12th. Running on Redhat 7.2
with the Berkeley db. The reason I went to downloading the source was
that the 6.0 binary
qmon would seg fault whenever I pressed any button. After installing I
could press any button
but suddenly now that I've submitted jobs, pressing the Job Control
button gives a seg fault. 
(I'm not entirely sure that I tested the Job Control button after
installing the execution hosts but
I definitely did after installing the master). 
 
Below is a list of all of the sge processes and a tail of a strace on
the qmon process.
 
I can probably work without the Job Control menu but I was really hoping
to get a fully working 
version via the cvs co.
 
Don
 
sgeadmin 19169     1  0 Jul12 ?        00:00:00
/tools/sge/6.0/bin/lx24-x86/sge_qmaster
root     19171 19169  0 Jul12 ?        00:00:06
/tools/sge/6.0/bin/lx24-x86/sge_qmaster
root     19172 19171  0 Jul12 ?        00:00:03
/tools/sge/6.0/bin/lx24-x86/sge_qmaster
root     19174 19171  0 Jul12 ?        00:00:00
/tools/sge/6.0/bin/lx24-x86/sge_qmaster
root     19175 19171  0 Jul12 ?        00:00:28
/tools/sge/6.0/bin/lx24-x86/sge_qmaster
root     19176 19171  0 Jul12 ?        00:00:10
/tools/sge/6.0/bin/lx24-x86/sge_qmaster
sgeadmin 19178 19171  0 Jul12 ?        00:00:01
/tools/sge/6.0/bin/lx24-x86/sge_qmaster
sgeadmin 19179 19171  0 Jul12 ?        00:00:00
/tools/sge/6.0/bin/lx24-x86/sge_qmaster
sgeadmin 19180 19171  0 Jul12 ?        00:00:00
/tools/sge/6.0/bin/lx24-x86/sge_qmaster
sgeadmin 19181 19171  0 Jul12 ?        00:00:03
/tools/sge/6.0/bin/lx24-x86/sge_qmaster
sgeadmin 19184     1  0 Jul12 ?        00:00:34
/tools/sge/6.0/bin/lx24-x86/sge_schedd

<snip>
[pid  9824] --- SIGRT_0 (Real-time signal 0) ---
[pid  9824] rt_sigprocmask(SIG_SETMASK, [32], NULL, 8) = 0
[pid  9824] gettimeofday({1089751763, 5691}, NULL) = 0
[pid  9824] rt_sigprocmask(SIG_BLOCK, NULL, [32], 8) = 0
[pid  9824] rt_sigprocmask(SIG_UNBLOCK, [32], [32], 8) = 0
[pid  9824] gettimeofday({1089751763, 5908}, NULL) = 0
[pid  9824] nanosleep({0, 999783000}, 0) = -1 EINTR (Interrupted system
call)
[pid  9824] --- SIGRT_0 (Real-time signal 0) ---
[pid  9824] rt_sigprocmask(SIG_SETMASK, [32], NULL, 8) = 0
[pid  9824] gettimeofday({1089751763, 43407}, NULL) = 0
[pid  9824] brk(0x8412000)              = 0x8412000
[pid  9824] brk(0x8413000)              = 0x8413000
[pid  9824] brk(0x8414000)              = 0x8414000
[pid  9824] brk(0x8415000)              = 0x8415000
[pid  9824] brk(0x8416000)              = 0x8416000
[pid  9824] brk(0x8417000)              = 0x8417000
[pid  9824] brk(0x8418000)              = 0x8418000
[pid  9824] brk(0x8419000)              = 0x8419000
[pid  9824] brk(0x841a000)              = 0x841a000
[pid  9824] brk(0x841b000)              = 0x841b000
[pid  9824] brk(0x841c000)              = 0x841c000
[pid  9824] brk(0x841d000)              = 0x841d000
[pid  9824] brk(0x841e000)              = 0x841e000
[pid  9824] brk(0x841f000)              = 0x841f000
[pid  9824] brk(0x8420000)              = 0x8420000
[pid  9824] brk(0x8421000)              = 0x8421000
[pid  9824] brk(0x8422000)              = 0x8422000
[pid  9824] brk(0x8423000)              = 0x8423000
[pid  9824] brk(0x8424000)              = 0x8424000
[pid  9824] brk(0x8425000)              = 0x8425000
[pid  9824] brk(0x8426000)              = 0x8426000
[pid  9824] brk(0x8427000)              = 0x8427000
[pid  9824] brk(0x8428000)              = 0x8428000
[pid  9824] brk(0x8429000)              = 0x8429000
[pid  9824] brk(0x842a000)              = 0x842a000
[pid  9824] brk(0x842b000)              = 0x842b000
[pid  9824] brk(0x842c000)              = 0x842c000
[pid  9824] brk(0x842d000)              = 0x842d000
[pid  9824] brk(0x842e000)              = 0x842e000
[pid  9824] brk(0x842f000)              = 0x842f000
[pid  9824] brk(0x8430000)              = 0x8430000
[pid  9824] brk(0x8431000)              = 0x8431000
[pid  9824] brk(0x8432000)              = 0x8432000
[pid  9824] brk(0x8433000)              = 0x8433000
[pid  9824] brk(0x8434000)              = 0x8434000
[pid  9824] brk(0x8435000)              = 0x8435000
[pid  9824] brk(0x8436000)              = 0x8436000
[pid  9824] brk(0x8437000)              = 0x8437000
[pid  9824] brk(0x8438000)              = 0x8438000
[pid  9824] brk(0x8439000)              = 0x8439000
[pid  9824] brk(0x843a000)              = 0x843a000
[pid  9824] brk(0x843b000)              = 0x843b000
[pid  9824] brk(0x843c000)              = 0x843c000
[pid  9824] brk(0x843d000)              = 0x843d000
[pid  9824] brk(0x843e000)              = 0x843e000
[pid  9824] brk(0x843f000)              = 0x843f000
[pid  9824] brk(0x8440000)              = 0x8440000
[pid  9824] brk(0x8441000)              = 0x8441000
[pid  9824] brk(0x8442000)              = 0x8442000
[pid  9824] brk(0x8443000)              = 0x8443000
[pid  9824] brk(0x8444000)              = 0x8444000
[pid  9824] brk(0x8445000)              = 0x8445000
[pid  9824] brk(0x8446000)              = 0x8446000
[pid  9824] brk(0x8447000)              = 0x8447000
[pid  9824] writev(6,
[{"F\0\5\0\300\1\300\1\"\1\300\1\0\0\0\0\5\0\20\0>\1\7\0\273"..., 2044},
{"\2\0\2\0\376\377\376\377\
v\0\0\0\376\377\2\0", 16}], 2) = 2060
[pid  9824] --- SIGSEGV (Segmentation fault) ---
[pid  9824] +++ killed by SIGSEGV +++
[pid  9825] <... poll resumed> [{fd=3, events=POLLIN}], 1, 2000) = 0
[pid  9825] getppid()                   = 1
[pid  9825] kill(9829, SIGKILL)         = 0
[pid  9825] --- SIGRT_1 (Real-time signal 1) ---
<... nanosleep resumed> 0)              = -1 EINTR (Interrupted system
call)
+++ killed by SIGKILL +++




More information about the gridengine-users mailing list