[GE users] sge_shepherd crashes

Roger Herikstad roger.herikstad at gmail.com
Fri Oct 31 02:13:27 GMT 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi list,
 I was hoping someone could help with a problem we are having. We are
running a cluster of 7 Mac machines, all running OSX 10.5.5, some
G5's, some MacPros. Recently, the sge_shepherd processes crashes on
the PPCs almost immediately after a job has started running on the
machine. I was wondering if maybe there is a known issue with some of
the recent security upgrades from apple, as the problem only surfaced
after doing these upgrades? Below is the crash report on one of the
PPCs:

Process:         sge_shepherd [36463]
Path:            /cluster/sge/bin/darwin-ppc/sge_shepherd
Identifier:      sge_shepherd
Version:         ??? (???)
Code Type:       PPC (Native)
Parent Process:  sge_execd [139]

Date/Time:       2008-10-31 10:00:38.833 +0800
OS Version:      Mac OS X 10.5.5 (9F33)
Report Version:  6

Exception Type:  EXC_BAD_ACCESS (SIGBUS)
Exception Codes: 0x000000000000000a, 0x000000000026a868
Crashed Thread:  0

Thread 0 Crashed:
0   dyld                          	0x8fe16f4c
ImageLoaderMachO::findExportedSymbol(char const*, void const*, bool,
ImageLoader const**) const + 412
1   dyld                          	0x8fe13dec
ImageLoaderMachO::resolveUndefined(ImageLoader::LinkContext const&,
macho_nlist const*, bool, ImageLoader const**) + 992
2   dyld                          	0x8fe142e4
ImageLoaderMachO::doBindIndirectSymbolPointers(ImageLoader::LinkContext
const&, bool, bool, bool) + 572
3   dyld                          	0x8fe0da14
ImageLoader::recursiveBind(ImageLoader::LinkContext const&, bool) +
140
4   dyld                          	0x8fe0d9e4
ImageLoader::recursiveBind(ImageLoader::LinkContext const&, bool) + 92
5   dyld                          	0x8fe1103c
ImageLoader::link(ImageLoader::LinkContext const&, bool, bool,
ImageLoader::RPathChain const&) + 336
6   dyld                          	0x8fe05250 dyld::link(ImageLoader*,
bool, ImageLoader::RPathChain const&) + 372
7   dyld                          	0x8fe07fb4 dyld::_main(mach_header
const*, unsigned long, int, char const**, char const**, char const**)
+ 3024
8   dyld                          	0x8fe01770
dyldbootstrap::start(mach_header const*, int, char const**, long) +
988
9   dyld                          	0x8fe01044 _dyld_start + 56

Thread 0 crashed with PPC Thread State 32:
  srr0: 0x8fe16f4c  srr1: 0x0000d030   dar: 0x0026a868 dsisr: 0x40000000
    r0: 0x00000d40    r1: 0xbfffe380    r2: 0x00003500    r3: 0x0026f448
    r4: 0x00267368    r5: 0x0025a044    r6: 0x000007f2    r7: 0x000006a0
    r8: 0x0000054e    r9: 0x00278edf   r10: 0x0017ad55   r11: 0x0017ad55
   r12: 0x8fe16db0   r13: 0x00000001   r14: 0x00177180   r15: 0x8fe312cc
   r16: 0x00000000   r17: 0x00000001   r18: 0x8fe312dc   r19: 0x8fe327a0
   r20: 0x0000000c   r21: 0x00145400   r22: 0x00000001   r23: 0x8fe34800
   r24: 0x00000001   r25: 0x0017ad56   r26: 0xbfffe558   r27: 0x00000000
   r28: 0x8fe348cc   r29: 0xffffffed   r30: 0x00264aa8   r31: 0x8fe16dbc
    cr: 0x84002084   xer: 0x00000000    lr: 0x8fe16dbc   ctr: 0x8fe16db0
vrsave: 0x00000000

Binary Images:
    0x1000 -   0x110ff3 +sge_shepherd ??? (???)
/cluster/sge/bin/darwin-ppc/sge_shepherd
  0x145000 -   0x16fff7 +libssl.0.9.7.dylib ??? (???)
<5dac2e94552ad76696c35bd6886f5a92>
/cluster/sge/lib/darwin-ppc/libssl.0.9.7.dylib
  0x17e000 -   0x238fff +libcrypto.0.9.7.dylib ??? (???)
<4ea3d7e9a1c28ac7b17ed80873fe6598>
/cluster/sge/lib/darwin-ppc/libcrypto.0.9.7.dylib
0x8fe00000 - 0x8fe30b23  dyld 96.2 (???)
<39109181acbf30fed542e6c9abcf1798> /usr/lib/dyld
0x901ea000 - 0x90383fe3  libSystem.B.dylib ??? (???)
<787ea59c19201d04a507b13d2bb3f9ac> /usr/lib/libSystem.B.dylib
0x907ce000 - 0x907d9ffb  libgcc_s.1.dylib ??? (???)
<ea47fd375407f162c76d14d64ba246cd> /usr/lib/libgcc_s.1.dylib
0x952bc000 - 0x952c1ff6  libmathCommon.A.dylib ??? (???)
/usr/lib/system/libmathCommon.A.dylib
0xffff8000 - 0xffff9703  libSystem.B.dylib ??? (???) /usr/lib/libSystem.B.dylib

I would be very happy if anyone could offer some help, or point me in
the right direction on this issue. Thanks a lot!

~ Roger

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list