[GE users] SEGFAULT on sge_qmaster 6.0u1

Olesen, Mark Mark.Olesen at arvinmeritor.com
Fri Apr 22 12:36:16 BST 2005


I found a solution to the problem, but don't understand the cause.
I removed the jobs/ and job_scripts/ dirs and now the qmaster starts without
a problem.

Seems very strange!


Dr. Mark Olesen
Principal Engineer Thermofluids Analysis
ArvinMeritor Light Vehicle Systems
ArvinMeritor Emissions Technologies GmbH
Biberbachstr. 9
D-86154 Augsburg, GERMANY
tel: +49 (821) 4103 - 862
fax: +49 (821) 4103 - 7862
Mark.Olesen at ArvinMeritor.com

> -----Original Message-----
> From: Olesen, Mark [mailto:Mark.Olesen at arvinmeritor.com]
> Sent: Friday, April 22, 2005 12:36 PM
> To: 'users at gridengine.sunsource.net'
> Subject: RE: [GE users] SEGFAULT on sge_qmaster 6.0u1
> 
> Using 'strace -f .../sge_qmaster' it would appear that the parent process
> has the problem:
> 
> [pid 11376] gettimeofday({1114165748, 386248}, {4294967176, 0}) = 0
> [pid 11376] write(6, "04/22/2005 12:29:08|qmaster|deal"..., 85) = 85
> [pid 11376] close(6)                    = 0
> [pid 11376] brk(0)                      = 0x8235000
> [pid 11376] brk(0x8236000)              = 0x8236000
> [pid 11376] brk(0)                      = 0x8236000
> [pid 11376] brk(0x8237000)              = 0x8237000
> [pid 11376] brk(0)                      = 0x8237000
> [pid 11376] brk(0x8238000)              = 0x8238000
> [pid 11376] brk(0)                      = 0x8238000
> [pid 11376] brk(0x8239000)              = 0x8239000
> [pid 11376] brk(0)                      = 0x8239000
> [pid 11376] brk(0x823a000)              = 0x823a000
> [pid 11376] brk(0)                      = 0x823a000
> [pid 11376] brk(0x823b000)              = 0x823b000
> [pid 11376] brk(0)                      = 0x823b000
> [pid 11376] brk(0x823c000)              = 0x823c000
> [pid 11376] brk(0)                      = 0x823c000
> [pid 11376] brk(0x823d000)              = 0x823d000
> [pid 11376] brk(0)                      = 0x823d000
> [pid 11376] brk(0x823e000)              = 0x823e000
> [pid 11376] gettimeofday({1114165748, 392706}, {4294967176, 0}) = 0
> [pid 11376] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> upeek: ptrace(PTRACE_PEEKUSER,11378,44,0): Operation not permitted
> detach: ptrace(PTRACE_DETACH, ...): Operation not permitted
> upeek: ptrace(PTRACE_PEEKUSER,11380,44,0): Operation not permitted
> detach: ptrace(PTRACE_DETACH, ...): Operation not permitted
> upeek: ptrace(PTRACE_PEEKUSER,11381,44,0): Operation not permitted
> detach: ptrace(PTRACE_DETACH, ...): Operation not permitted
> upeek: ptrace(PTRACE_PEEKUSER,11379,44,0): Operation not permitted
> detach: ptrace(PTRACE_DETACH, ...): Operation not permitted
> upeek: ptrace(PTRACE_PEEKUSER,11377,44,0): Operation not permitted
> detach: ptrace(PTRACE_DETACH, ...): Operation not permitted
> upeek: ptrace(PTRACE_PEEKUSER,11382,44,0): Operation not permitted
> detach: ptrace(PTRACE_DETACH, ...): Operation not permitted
> 
> 
> BTW: I am using classic spooling
> 
> Dr. Mark Olesen
> Principal Engineer Thermofluids Analysis
> ArvinMeritor Light Vehicle Systems
> ArvinMeritor Emissions Technologies GmbH
> Biberbachstr. 9
> D-86154 Augsburg, GERMANY
> tel: +49 (821) 4103 - 862
> fax: +49 (821) 4103 - 7862
> Mark.Olesen at ArvinMeritor.com
> 
> > -----Original Message-----
> > From: Olesen, Mark [mailto:Mark.Olesen at arvinmeritor.com]
> > Sent: Friday, April 22, 2005 12:05 PM
> > To: GridEngine
> > Subject: [GE users] SEGFAULT on sge_qmaster 6.0u1
> >
> > After restarting, the qmaster daemon fails to start (lx24-x86) -
> actually
> > it
> > forks and then fails.
> >
> >
> > AFAIK I haven't changed anything significant on the configuration (Admin
> > email address, complexes, load-sensor) within the last while that should
> > affect sge_qmaster.  Some time ago I did have a problem with spaces
> within
> > a
> > complex string preventing the files from being re-read, but I've since
> > removed the problem.
> >
> > The message file displays the following:
> >
> > 04/22/2005 11:48:47|qmaster|dealog01|W|local configuration
> > dealog01.zeunastaerker.de not defined - using global configuration
> > 04/22/2005 11:48:48|qmaster|dealog01|I|read job database with 5 entries
> in
> > 0
> > seconds
> >
> >
> > using debug level 'dl 1' I receive the following info:
> >
> >    889  10995 16384     TSTSOS: 1 slots used (limit 1) -> suspended
> >    890  10995 16384     qinstance "(null)"  suspended on subordinate
> >    891  10995 16384     Due to other suspend states signal will NOT be
> > delivered
> >    892  10995 16384     QUEUE (null): queued signal STOP (retry after 60
> > seconds) host dealc02.zeunastaerker.de
> >    893  10995 16384     te_delete_event: (t:5 u1:0 u2:0 s:(null))
> >    894  10995 16384     te_add_event: (t:5 w:1114163876 m:1 s:(null))
> > Segmentation fault
> >
> > With debug level 'dl 2' I receive the following info:
> >
> >
> >   7228  11006 16384 <-- te_add_event()
> > ../daemons/qmaster/sge_qmaster_timed_event.c 345 }
> >   7229  11006 16384 --> te_free_event() {
> >   7230  11006 16384 <-- te_free_event()
> > ../daemons/qmaster/sge_qmaster_timed_event.c 259 }
> >   7231  11006 16384 --> signal_slave_jobs_in_queue() {
> > Segmentation fault
> >
> >
> > Based on these messages, where should I start looking for sorting out
> the
> > problem.
> >
> >
> >
> >
> > Dr. Mark Olesen
> > Principal Engineer Thermofluids Analysis
> > ArvinMeritor Light Vehicle Systems
> > ArvinMeritor Emissions Technologies GmbH
> > Biberbachstr. 9
> > D-86154 Augsburg, GERMANY
> > tel: +49 (821) 4103 - 862
> > fax: +49 (821) 4103 - 7862
> > Mark.Olesen at ArvinMeritor.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list