[GE issues] [Issue 3192] New - double free corruption when getting groupid of ldap users

jdprasad johnnydevaprasad at gmail.com
Mon Nov 23 16:32:23 GMT 2009


http://gridengine.sunsource.net/issues/show_bug.cgi?id=3192
                 Issue #|3192
                 Summary|double free corruption when getting groupid of ldap us
                        |ers
               Component|gridengine
                 Version|6.2u4
                Platform|All
                     URL|http://d
              OS/Version|Linux
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P2
            Subcomponent|execution
             Assigned to|pollinger
             Reported by|jdprasad






------- Additional comments from jdprasad at sunsource.net Mon Nov 23 08:32:21 -0800 2009 -------
OS: SuSE Linux Enterprise Server 11 (SLES 11)

Scenario:
---------
test   is an LDAP user and
testgr is an LDAP group

root at slest-test:> su - test

test at sles-test:~>  id

uid=1003(test) gid=22222(testgr) groups=22222(testgr)
test at sles-test:~> qstat
*** glibc detected *** qstat: double free or corruption (out): 0x00002aaaaab0e180 ***
======= Backtrace: =========
/lib64/libc.so.6[0x2aaaab3b5118]
/lib64/libc.so.6(cfree+0x76)[0x2aaaab3b6c76]
/lib64/libnss_ldap.so.2[0x2aaaabcd14fb]
/lib64/libnss_ldap.so.2[0x2aaaabcd18f7]
/lib64/libnss_ldap.so.2[0x2aaaabccfc22]
/lib64/libnss_ldap.so.2(_nss_ldap_getgrgid_r+0x53)[0x2aaaabcd0313]
/lib64/libnss_compat.so.2[0x2aaaab8a9b6b]
/lib64/libnss_compat.so.2(_nss_compat_getgrgid_r+0xf8)[0x2aaaab8a9d28]
/lib64/libc.so.6(getgrgid_r+0xec)[0x2aaaab3e047c]
qstat[0x531048]
qstat[0x531e8b]
qstat[0x469ab6]
qstat[0x469eb5]
qstat[0x40ae27]
/lib64/libc.so.6(__libc_start_main+0xe6)[0x2aaaab35f586]
qstat[0x407089]
....
.....
......
test at sles-test:~> gdb `which qstat` core.26708
(gdb) bt
#0  0x00002aaaab373645 in raise () from /lib64/libc.so.6
#1  0x00002aaaab374c33 in abort () from /lib64/libc.so.6
#2  0x00002aaaab3af8e8 in ?? () from /lib64/libc.so.6
#3  0x00002aaaab3b5118 in ?? () from /lib64/libc.so.6
#4  0x00002aaaab3b6c76 in free () from /lib64/libc.so.6
#5  0x00002aaaabcd14fb in ?? () from /lib64/libnss_ldap.so.2
#6  0x00002aaaabcd18f7 in ?? () from /lib64/libnss_ldap.so.2
#7  0x00002aaaabccfc22 in ?? () from /lib64/libnss_ldap.so.2
#8  0x00002aaaabcd0313 in _nss_ldap_getgrgid_r () from /lib64/libnss_ldap.so.2
#9  0x00002aaaab8a9b6b in ?? () from /lib64/libnss_compat.so.2
#10 0x00002aaaab8a9d28 in _nss_compat_getgrgid_r () from /lib64/libnss_compat.so.2
#11 0x00002aaaab3e047c in getgrgid_r () from /lib64/libc.so.6
#12 0x0000000000531048 in sge_getgrgid_r ()
#13 0x0000000000531e8b in sge_gid2group ()
#14 0x0000000000469ab6 in sge_setup2 ()
#15 0x0000000000469eb5 in sge_gdi2_setup ()
#16 0x000000000040ae27 in main ()

The malloc check can be prevented by setting the env variable:
export MALLOC_CHECK_=0
and the command succeeds. 

The gdb backtrace shows that the crash occurs in the nss_compat lib calls.
The following patches to SGE, temporarily solves the issue, but I hope there will 
be a permanent fix for this.

--- sge-6.2u4/gridengine/source/daemons/shepherd/sge_shepherd_ijs.c     2009-07-10 17:59:17.000000000 +0200
+++ sge-6.2u4-new/gridengine/source/daemons/shepherd/sge_shepherd_ijs.c 2009-11-23 13:11:50.000000000 +0100
@@ -747,7 +747,7 @@
    THREAD_HANDLE     *thread_pty_to_commlib = NULL;
    THREAD_HANDLE     *thread_commlib_to_pty = NULL;
    cl_raw_list_t     *cl_com_log_list = NULL;
-
+   setenv("MALLOC_CHECK_", "0", 1);
    shepherd_trace("parent: starting parent loop with remote_host = %s, "
                   "remote_port = %d, job_owner = %s, fd_pty_master = %d, "
                   "fd_pipe_in = %d, fd_pipe_out = %d, "
----------------------------------------------------------------------------------------------------
--- sge-6.2u3/gridengine/source/libs/gdi/sge_gdi_ctx.c  2009-01-22 17:03:50.000000000 +0100
+++ sge-6.2u3_new/gridengine/source/libs/gdi/sge_gdi_ctx.c      2009-11-23 11:58:13.000000000 +0100
@@ -1875,6 +1875,7 @@
    u_long32 sge_execd_port = 0;
    bool from_services = false;

+   setenv("MALLOC_CHECK_", "0", 1);
    DENTER(TOP_LAYER, "sge_setup2");

    if (context == NULL) {
@@ -1948,7 +1949,7 @@
 {
    int ret = AE_OK;
    bool alpp_was_null = true;
-
+   setenv("MALLOC_CHECK_", "0", 1);
    DENTER(TOP_LAYER, "sge_gdi2_setup");

    if (context_ref && sge_gdi_ctx_is_setup(*context_ref)) {
@@ -1980,6 +1981,7 @@
    sge_prog_state_class_t* prog_state = thiz->get_sge_prog_state(thiz);
    int ret = CL_RETVAL_OK;

+   setenv("MALLOC_CHECK_", "0", 1);
    DENTER(TOP_LAYER, "gdi2_reresolve_qualified_hostname");

    ret=getuniquehostname(prog_state->get_qualified_hostname(prog_state), unique_hostname, 0);

Probably, this is not the ideal solution.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=228806

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list