[GE users] sge_shepherd segfaults

shaas stefan.haas at sun.com
Fri Jan 15 10:04:58 GMT 2010


On 01/14/10 19:09, kisielk wrote:
>> I'm still trying to get SGE running on OpenSUSE 11.1. I'm using the CVS version to work around the LDAP user problems I was having earlier.
>>
>> I now have job submission working for all my users, however whenever a job begins to run it immediately causes sge_shepherd to segfault. 
>>
>> From dmesg:
>> sge_shepherd[21485]: segfault at 7f7878000000 ip 00007f787a630bc7 sp 00007fff833bc980 error 4 in libc-2.9.so[7f787a5bb000+14f000]
>>
>> From the qmaster messages file:
>> 01/13/2010 15:42:19|worker|demo|W|job 7.1 failed on host demo.lan.zymeworks.com assumedly after job because: job 7.1 died through signal SEGV (11)
>>
>>
>> I tried compiling with -no-opt and -debug but was still unable to get any more information than what's above.
>>
>> How can I go about debugging this problem?
> 
> After compiling SGE with -no-jemalloc everything appears to work. This problem appears to be related to the use of jemalloc, much like the one I had with nss_ldap.
> 

When did you last update your cvs repository?
I've commited a fix to our jemalloc on 2010-01-05 which seems to be 
related to your problem!

Stefan

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=238945

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list