[GE users] SEGFAULT on sge_qmaster 6.0u1

Olesen, Mark Mark.Olesen at arvinmeritor.com
Fri Apr 22 11:35:48 BST 2005


Using 'strace -f .../sge_qmaster' it would appear that the parent process
has the problem:

[pid 11376] gettimeofday({1114165748, 386248}, {4294967176, 0}) = 0
[pid 11376] write(6, "04/22/2005 12:29:08|qmaster|deal"..., 85) = 85
[pid 11376] close(6)                    = 0
[pid 11376] brk(0)                      = 0x8235000
[pid 11376] brk(0x8236000)              = 0x8236000
[pid 11376] brk(0)                      = 0x8236000
[pid 11376] brk(0x8237000)              = 0x8237000
[pid 11376] brk(0)                      = 0x8237000
[pid 11376] brk(0x8238000)              = 0x8238000
[pid 11376] brk(0)                      = 0x8238000
[pid 11376] brk(0x8239000)              = 0x8239000
[pid 11376] brk(0)                      = 0x8239000
[pid 11376] brk(0x823a000)              = 0x823a000
[pid 11376] brk(0)                      = 0x823a000
[pid 11376] brk(0x823b000)              = 0x823b000
[pid 11376] brk(0)                      = 0x823b000
[pid 11376] brk(0x823c000)              = 0x823c000
[pid 11376] brk(0)                      = 0x823c000
[pid 11376] brk(0x823d000)              = 0x823d000
[pid 11376] brk(0)                      = 0x823d000
[pid 11376] brk(0x823e000)              = 0x823e000
[pid 11376] gettimeofday({1114165748, 392706}, {4294967176, 0}) = 0
[pid 11376] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
upeek: ptrace(PTRACE_PEEKUSER,11378,44,0): Operation not permitted
detach: ptrace(PTRACE_DETACH, ...): Operation not permitted
upeek: ptrace(PTRACE_PEEKUSER,11380,44,0): Operation not permitted
detach: ptrace(PTRACE_DETACH, ...): Operation not permitted
upeek: ptrace(PTRACE_PEEKUSER,11381,44,0): Operation not permitted
detach: ptrace(PTRACE_DETACH, ...): Operation not permitted
upeek: ptrace(PTRACE_PEEKUSER,11379,44,0): Operation not permitted
detach: ptrace(PTRACE_DETACH, ...): Operation not permitted
upeek: ptrace(PTRACE_PEEKUSER,11377,44,0): Operation not permitted
detach: ptrace(PTRACE_DETACH, ...): Operation not permitted
upeek: ptrace(PTRACE_PEEKUSER,11382,44,0): Operation not permitted
detach: ptrace(PTRACE_DETACH, ...): Operation not permitted


BTW: I am using classic spooling

Dr. Mark Olesen
Principal Engineer Thermofluids Analysis
ArvinMeritor Light Vehicle Systems
ArvinMeritor Emissions Technologies GmbH
Biberbachstr. 9
D-86154 Augsburg, GERMANY
tel: +49 (821) 4103 - 862
fax: +49 (821) 4103 - 7862
Mark.Olesen at ArvinMeritor.com

> -----Original Message-----
> From: Olesen, Mark [mailto:Mark.Olesen at arvinmeritor.com]
> Sent: Friday, April 22, 2005 12:05 PM
> To: GridEngine
> Subject: [GE users] SEGFAULT on sge_qmaster 6.0u1
> 
> After restarting, the qmaster daemon fails to start (lx24-x86) - actually
> it
> forks and then fails.
> 
> 
> AFAIK I haven't changed anything significant on the configuration (Admin
> email address, complexes, load-sensor) within the last while that should
> affect sge_qmaster.  Some time ago I did have a problem with spaces within
> a
> complex string preventing the files from being re-read, but I've since
> removed the problem.
> 
> The message file displays the following:
> 
> 04/22/2005 11:48:47|qmaster|dealog01|W|local configuration
> dealog01.zeunastaerker.de not defined - using global configuration
> 04/22/2005 11:48:48|qmaster|dealog01|I|read job database with 5 entries in
> 0
> seconds
> 
> 
> using debug level 'dl 1' I receive the following info:
> 
>    889  10995 16384     TSTSOS: 1 slots used (limit 1) -> suspended
>    890  10995 16384     qinstance "(null)"  suspended on subordinate
>    891  10995 16384     Due to other suspend states signal will NOT be
> delivered
>    892  10995 16384     QUEUE (null): queued signal STOP (retry after 60
> seconds) host dealc02.zeunastaerker.de
>    893  10995 16384     te_delete_event: (t:5 u1:0 u2:0 s:(null))
>    894  10995 16384     te_add_event: (t:5 w:1114163876 m:1 s:(null))
> Segmentation fault
> 
> With debug level 'dl 2' I receive the following info:
> 
> 
>   7228  11006 16384 <-- te_add_event()
> ../daemons/qmaster/sge_qmaster_timed_event.c 345 }
>   7229  11006 16384 --> te_free_event() {
>   7230  11006 16384 <-- te_free_event()
> ../daemons/qmaster/sge_qmaster_timed_event.c 259 }
>   7231  11006 16384 --> signal_slave_jobs_in_queue() {
> Segmentation fault
> 
> 
> Based on these messages, where should I start looking for sorting out the
> problem.
> 
> 
> 
> 
> Dr. Mark Olesen
> Principal Engineer Thermofluids Analysis
> ArvinMeritor Light Vehicle Systems
> ArvinMeritor Emissions Technologies GmbH
> Biberbachstr. 9
> D-86154 Augsburg, GERMANY
> tel: +49 (821) 4103 - 862
> fax: +49 (821) 4103 - 7862
> Mark.Olesen at ArvinMeritor.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list