[GE users] Tools segfault for users in LDAP

templedf dan.templeton at sun.com
Tue Jan 5 14:04:57 GMT 2010


Looking at that stack trace, the segfault is in libc, probably caused by 
something in your libnss_ldap.  The routine that's triggering the 
segfault (sge_gdi2_setup()) is called any time any command or daemon 
tries to start a conversation with the qmaster, so if it were really a 
Grid Engine issue, you would not have been the first to find it.  Have 
you tried updating your libnss_ldap?

Digging a little deeper, it looks like the cfree() call at the top of 
the stack is an oddity.  From the cfree(3) man page:

> This function should never be used. Use free(3) instead. 

Why the LDAP library is calling cfree() is beyond me.  I also just had a 
look at the sge_gid2group() and sge_getgrgid() functions, and I can't 
see anything there that could be the source of a bad pointer.

As a side note, this is exactly why I love Linux so much.  There's 
nothing quite as much fun as debugging the kernel to get your 
application working. :)

Daniel

kisielk wrote:
> I'm trying to get SGE (6.2u4 or 6.2u5) up and running on an OpenSUSE 11.1 host. The daemon runs fine, and all the end-user tools work fine for local users. However, if a user that is defined in LDAP tries to use a tool such as qstat it segfaults. The backtrace output is as follows:
>
> *** glibc detected *** bin/lx24-amd64/qstat: free(): invalid pointer: 0x00007fc0af12b0e0 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x7fc0af27d108]
> /lib64/libc.so.6(cfree+0x76)[0x7fc0af27ec66]
> /lib64/libnss_ldap.so.2[0x7fc0aece1c21]
> /lib64/libnss_ldap.so.2[0x7fc0aecdd913]
> /lib64/libnss_ldap.so.2[0x7fc0aecdbc22]
> /lib64/libnss_ldap.so.2(_nss_ldap_getgrgid_r+0x53)[0x7fc0aecdc313]
> /lib64/libc.so.6(getgrgid_r+0xec)[0x7fc0af2a848c]
> bin/lx24-amd64/qstat(sge_getgrgid_r+0xc5)[0x53ee85]
> bin/lx24-amd64/qstat(sge_gid2group+0x6a)[0x53e3aa]
> bin/lx24-amd64/qstat(sge_setup2+0x1a1)[0x479911]
> bin/lx24-amd64/qstat(sge_gdi2_setup+0x121)[0x479c71]
> bin/lx24-amd64/qstat(main+0x125)[0x425ae5]
> /lib64/libc.so.6(__libc_start_main+0xe6)[0x7fc0af227586]
> bin/lx24-amd64/qstat(readdir_r+0xa2)[0x4258ea]
>
> I tried both the binaries as well as compiling it from source myself. Has anyone else seen this problem, and is there a known fix?
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=236395
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=236572

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list