[GE users] sge master dying

Iwona Sakrejda isakrejda at lbl.gov
Wed Jun 13 19:01:03 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Sorry, been a while since I did active development and debugging.
Here it is (and actually I can do qconf for users and queues,
just this qconf for hostgroups is giving me grief...)


iwona


[root at pc2533 debug]# gdb /common/sge/6.0u4/bin/lx24-x86/sge_qmaster 16569
GNU gdb Red Hat Linux (6.1post-1.20040607.17rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host 
libthread_db library "/lib/tls/libthread_db.so.1".

Attaching to program: /chos/software/sge/6.0u4/bin/lx24-x86/sge_qmaster, 
process 16569
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/tls/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread -1220095328 (LWP 16569)]
[New Thread -1317291088 (LWP 16704)]
[New Thread -1306801232 (LWP 16703)]
[New Thread -1296307280 (LWP 16702)]
[New Thread -1285555280 (LWP 16701)]
[New Thread -1265304656 (LWP 16575)]
[New Thread -1254814800 (LWP 16574)]
[New Thread -1244324944 (LWP 16573)]
[New Thread -1233835088 (LWP 16572)]
[New Thread -1223345232 (LWP 16570)]
Loaded symbols for /lib/tls/libpthread.so.0
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from 
/chos/software/sge/6.0u4/lib/lx24-x86/libspoolc.so...done.
Loaded symbols for /software/sge/6.0u4/lib/lx24-x86/libspoolc.so
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
0xb75acd58 in pthread_join () from /lib/tls/libpthread.so.0
(gdb) cont
Continuing.

Program received signal SIGBUS, Bus error.
[Switching to Thread -1317291088 (LWP 16704)]
0x0809a007 in hgroup_mod ()
(gdb) quit


Rayson Ho wrote:
> Use the gdb sub-command "where" to show the stack trace...
>
> Rayson
>
>
>
> On 6/13/07, Iwona Sakrejda <isakrejda at lbl.gov> wrote:
>> Here is what I see when it crashes while attached to gdb:
>> [root at pc2533 debug]# gdb /common/sge/6.0u4/bin/lx24-x86/sge_qmaster 
>> 16569
>> GNU gdb Red Hat Linux (6.1post-1.20040607.17rh)
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and 
>> you are
>> welcome to change it and/or distribute copies of it under certain
>> conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for 
>> details.
>> This GDB was configured as "i386-redhat-linux-gnu"...Using host
>> libthread_db library "/lib/tls/libthread_db.so.1".
>>
>> Attaching to program: /chos/software/sge/6.0u4/bin/lx24-x86/sge_qmaster,
>> process 16569
>> Reading symbols from /lib/libdl.so.2...done.
>> Loaded symbols for /lib/libdl.so.2
>> Reading symbols from /lib/tls/libm.so.6...done.
>> Loaded symbols for /lib/tls/libm.so.6
>> Reading symbols from /lib/tls/libpthread.so.0...done.
>> [Thread debugging using libthread_db enabled]
>> [New Thread -1220095328 (LWP 16569)]
>> [New Thread -1317291088 (LWP 16704)]
>> [New Thread -1306801232 (LWP 16703)]
>> [New Thread -1296307280 (LWP 16702)]
>> [New Thread -1285555280 (LWP 16701)]
>> [New Thread -1265304656 (LWP 16575)]
>> [New Thread -1254814800 (LWP 16574)]
>> [New Thread -1244324944 (LWP 16573)]
>> [New Thread -1233835088 (LWP 16572)]
>> [New Thread -1223345232 (LWP 16570)]
>> Loaded symbols for /lib/tls/libpthread.so.0
>> Reading symbols from /lib/tls/libc.so.6...done.
>> Loaded symbols for /lib/tls/libc.so.6
>> Reading symbols from /lib/ld-linux.so.2...done.
>> Loaded symbols for /lib/ld-linux.so.2
>> Reading symbols from /lib/libnss_files.so.2...done.
>> Loaded symbols for /lib/libnss_files.so.2
>> Reading symbols from
>> /chos/software/sge/6.0u4/lib/lx24-x86/libspoolc.so...done.
>> Loaded symbols for /software/sge/6.0u4/lib/lx24-x86/libspoolc.so
>> Reading symbols from /lib/libnss_dns.so.2...done.
>> Loaded symbols for /lib/libnss_dns.so.2
>> Reading symbols from /lib/libresolv.so.2...done.
>> Loaded symbols for /lib/libresolv.so.2
>> 0xb75acd58 in pthread_join () from /lib/tls/libpthread.so.0
>> (gdb) cont
>> Continuing.
>>
>> Program received signal SIGBUS, Bus error.
>> [Switching to Thread -1317291088 (LWP 16704)]
>> 0x0809a007 in hgroup_mod ()
>> (gdb) quit
>>
>>
>>
>> Rayson Ho wrote:
>> > Can you attach qmaster with a debugger, so that we can get the stack
>> > trace when it dies??
>> >
>> > Rayson
>> >
>> >
>> >
>> > On 6/13/07, Iwona Sakrejda <isakrejda at lbl.gov> wrote:
>> >> Hi,
>> >>
>> >> I an running SGE 6.0u4 on rhel3 and It's been running ok for a year
>> >> or so.
>> >> Last week i tried qconf -mhgrp and this command repeatedly kills 
>> all the
>> >> sge processes on the headnode. I connected with strace to the 
>> sgeadmin
>> >> before it died and I only see:
>> >> rocess 16727 attached - interrupt to quit
>> >> futex(0xb03bebf8, FUTEX_WAIT, 16822, NULL) = -1 EINTR (Interrupted
>> >> system call)
>> >> +++ killed by SIGBUS +++
>> >>
>> >> Nothing exciting in the logs, it's just going about its bussiness...
>> >>
>> >> Suggestions on how to approch this problem would be appreciated...
>> >>
>> >> Thank You,
>> >>
>> >> iwona
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >>
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list