Opened 7 years ago

Closed 7 years ago

#748 closed defect (duplicate)

IZ3192: double free corruption when getting groupid of ldap users

Reported by: jdprasad Owned by:
Priority: high Milestone:
Component: sge Version: 6.2u4
Severity: minor Keywords: Linux execution
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3192]

        Issue #:      3192             Platform:     All      Reporter: jdprasad (jdprasad)
       Component:     gridengine          OS:        Linux
     Subcomponent:    execution        Version:      6.2u4       CC:    None defined
        Status:       REOPENED         Priority:     P2
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    pollinger (pollinger)
      QA Contact:     pollinger
          URL:        http://d
       * Summary:     double free corruption when getting groupid of ldap users
   Status whiteboard:
      Attachments:

     Issue 3192 blocks:
   Votes for issue 3192:


   Opened: Mon Nov 23 09:32:00 -0700 2009 
------------------------


OS: SuSE Linux Enterprise Server 11 (SLES 11)

Scenario:
---------
test   is an LDAP user and
testgr is an LDAP group

root@slest-test:> su - test

test@sles-test:~>  id

uid=1003(test) gid=22222(testgr) groups=22222(testgr)
test@sles-test:~> qstat
*** glibc detected *** qstat: double free or corruption (out): 0x00002aaaaab0e180 ***
======= Backtrace: =========
/lib64/libc.so.6[0x2aaaab3b5118]
/lib64/libc.so.6(cfree+0x76)[0x2aaaab3b6c76]
/lib64/libnss_ldap.so.2[0x2aaaabcd14fb]
/lib64/libnss_ldap.so.2[0x2aaaabcd18f7]
/lib64/libnss_ldap.so.2[0x2aaaabccfc22]
/lib64/libnss_ldap.so.2(_nss_ldap_getgrgid_r+0x53)[0x2aaaabcd0313]
/lib64/libnss_compat.so.2[0x2aaaab8a9b6b]
/lib64/libnss_compat.so.2(_nss_compat_getgrgid_r+0xf8)[0x2aaaab8a9d28]
/lib64/libc.so.6(getgrgid_r+0xec)[0x2aaaab3e047c]
qstat[0x531048]
qstat[0x531e8b]
qstat[0x469ab6]
qstat[0x469eb5]
qstat[0x40ae27]
/lib64/libc.so.6(__libc_start_main+0xe6)[0x2aaaab35f586]
qstat[0x407089]
....
.....
......
test@sles-test:~> gdb `which qstat` core.26708
(gdb) bt
#0  0x00002aaaab373645 in raise () from /lib64/libc.so.6
#1  0x00002aaaab374c33 in abort () from /lib64/libc.so.6
#2  0x00002aaaab3af8e8 in ?? () from /lib64/libc.so.6
#3  0x00002aaaab3b5118 in ?? () from /lib64/libc.so.6
#4  0x00002aaaab3b6c76 in free () from /lib64/libc.so.6
#5  0x00002aaaabcd14fb in ?? () from /lib64/libnss_ldap.so.2
#6  0x00002aaaabcd18f7 in ?? () from /lib64/libnss_ldap.so.2
#7  0x00002aaaabccfc22 in ?? () from /lib64/libnss_ldap.so.2
#8  0x00002aaaabcd0313 in _nss_ldap_getgrgid_r () from /lib64/libnss_ldap.so.2
#9  0x00002aaaab8a9b6b in ?? () from /lib64/libnss_compat.so.2
#10 0x00002aaaab8a9d28 in _nss_compat_getgrgid_r () from /lib64/libnss_compat.so.2
#11 0x00002aaaab3e047c in getgrgid_r () from /lib64/libc.so.6
#12 0x0000000000531048 in sge_getgrgid_r ()
#13 0x0000000000531e8b in sge_gid2group ()
#14 0x0000000000469ab6 in sge_setup2 ()
#15 0x0000000000469eb5 in sge_gdi2_setup ()
#16 0x000000000040ae27 in main ()

The malloc check can be prevented by setting the env variable:
export MALLOC_CHECK_=0
and the command succeeds.

The gdb backtrace shows that the crash occurs in the nss_compat lib calls.
The following patches to SGE, temporarily solves the issue, but I hope there will
be a permanent fix for this.

--- sge-6.2u4/gridengine/source/daemons/shepherd/sge_shepherd_ijs.c     2009-07-10 17:59:17.000000000 +0200
+++ sge-6.2u4-new/gridengine/source/daemons/shepherd/sge_shepherd_ijs.c 2009-11-23 13:11:50.000000000 +0100
@@ -747,7 +747,7 @@
    THREAD_HANDLE     *thread_pty_to_commlib = NULL;
    THREAD_HANDLE     *thread_commlib_to_pty = NULL;
    cl_raw_list_t     *cl_com_log_list = NULL;
-
+   setenv("MALLOC_CHECK_", "0", 1);
    shepherd_trace("parent: starting parent loop with remote_host = %s, "
                   "remote_port = %d, job_owner = %s, fd_pty_master = %d, "
                   "fd_pipe_in = %d, fd_pipe_out = %d, "
----------------------------------------------------------------------------------------------------
--- sge-6.2u3/gridengine/source/libs/gdi/sge_gdi_ctx.c  2009-01-22 17:03:50.000000000 +0100
+++ sge-6.2u3_new/gridengine/source/libs/gdi/sge_gdi_ctx.c      2009-11-23 11:58:13.000000000 +0100
@@ -1875,6 +1875,7 @@
    u_long32 sge_execd_port = 0;
    bool from_services = false;

+   setenv("MALLOC_CHECK_", "0", 1);
    DENTER(TOP_LAYER, "sge_setup2");

    if (context == NULL) {
@@ -1948,7 +1949,7 @@
 {
    int ret = AE_OK;
    bool alpp_was_null = true;
-
+   setenv("MALLOC_CHECK_", "0", 1);
    DENTER(TOP_LAYER, "sge_gdi2_setup");

    if (context_ref && sge_gdi_ctx_is_setup(*context_ref)) {
@@ -1980,6 +1981,7 @@
    sge_prog_state_class_t* prog_state = thiz->get_sge_prog_state(thiz);
    int ret = CL_RETVAL_OK;

+   setenv("MALLOC_CHECK_", "0", 1);
    DENTER(TOP_LAYER, "gdi2_reresolve_qualified_hostname");

    ret=getuniquehostname(prog_state->get_qualified_hostname(prog_state), unique_hostname, 0);

Probably, this is not the ideal solution.

   ------- Additional comments from shaas Tue Nov 24 05:43:53 -0700 2009 -------
This is the same problem which is mention in issue 3193

*** This issue has been marked as a duplicate of 3193 ***

   ------- Additional comments from jdprasad Mon Aug 16 04:34:32 -0700 2010 -------
This problem apparently still exists in sge 6.2u5
OS: SuSE Linux Enterprise Server (SLES11)
Kernel: 2.6.27.19-5-default
openldap: openldap2-devel-2.4.12-7.16
          openldap-db4-4.6.21-47_cm5.1
          openldap-servers-2.4.22-47_cm5.1
          openldap2-client-2.4.12-7.16

But this time its not a double free or corruption

*** glibc detected *** qstat: free(): invalid pointer: 0x00002aaaaab19c00 ***
======= Backtrace: =========
/lib64/libc.so.6[0x2aaaab3b5118]
/lib64/libc.so.6(cfree+0x76)[0x2aaaab3b6c76]
/usr/lib64/libldap-2.4.so.2(ldap_pvt_get_fqdn+0x6a)[0x2aaaabf1221a]
/usr/lib64/libldap-2.4.so.2(ldap_int_initialize+0x57)[0x2aaaabf108b7]
/usr/lib64/libldap-2.4.so.2(ldap_create+0x26)[0x2aaaabef6816]
/usr/lib64/libldap-2.4.so.2(ldap_initialize+0x2f)[0x2aaaabef6dff]
/lib64/libnss_ldap.so.2[0x2aaaabccc327]
/lib64/libnss_ldap.so.2[0x2aaaabccf411]
/lib64/libnss_ldap.so.2[0x2aaaabccfbcf]
/lib64/libnss_ldap.so.2(_nss_ldap_getpwuid_r+0x49)[0x2aaaabcd01b9]
/lib64/libnss_compat.so.2[0x2aaaab8aaab8]
/lib64/libnss_compat.so.2[0x2aaaab8aacad]
/lib64/libnss_compat.so.2(_nss_compat_getpwuid_r+0x100)[0x2aaaab8ab040]
/lib64/libc.so.6(getpwuid_r+0xec)[0x2aaaab3e1cfc]
qstat[0x54d032]
qstat[0x4704b1]
qstat[0x47090c]
qstat[0x40b9d6]
/lib64/libc.so.6(__libc_start_main+0xe6)[0x2aaaab35f586]
...
...

gdb `which qstat` core.29665

gdb) bt
#0  0x00002aaaab373645 in raise () from /lib64/libc.so.6
#1  0x00002aaaab374c33 in abort () from /lib64/libc.so.6
#2  0x00002aaaab3af8e8 in ?? () from /lib64/libc.so.6
#3  0x00002aaaab3b5118 in ?? () from /lib64/libc.so.6
#4  0x00002aaaab3b6c76 in free () from /lib64/libc.so.6
#5  0x00002aaaabf1221a in ldap_pvt_get_fqdn () from /usr/lib64/libldap-2.4.so.2
#6  0x00002aaaabf108b7 in ldap_int_initialize () from /usr/lib64/libldap-2.4.so.2
#7  0x00002aaaabef6816 in ldap_create () from /usr/lib64/libldap-2.4.so.2
#8  0x00002aaaabef6dff in ldap_initialize () from /usr/lib64/libldap-2.4.so.2
#9  0x00002aaaabccc327 in ?? () from /lib64/libnss_ldap.so.2
#10 0x00002aaaabccf411 in ?? () from /lib64/libnss_ldap.so.2
#11 0x00002aaaabccfbcf in ?? () from /lib64/libnss_ldap.so.2
#12 0x00002aaaabcd01b9 in _nss_ldap_getpwuid_r () from /lib64/libnss_ldap.so.2
#13 0x00002aaaab8aaab8 in ?? () from /lib64/libnss_compat.so.2
#14 0x00002aaaab8aacad in ?? () from /lib64/libnss_compat.so.2
#15 0x00002aaaab8ab040 in _nss_compat_getpwuid_r () from /lib64/libnss_compat.so.2
#16 0x00002aaaab3e1cfc in getpwuid_r () from /lib64/libc.so.6
#17 0x000000000054d032 in sge_uid2user ()
#18 0x00000000004704b1 in sge_setup2 ()
#19 0x000000000047090c in sge_gdi2_setup ()
#20 0x000000000040b9d6 in main ()

Change History (1)

comment:1 Changed 7 years ago by dlove

  • Resolution set to duplicate
  • Severity set to minor
  • Status changed from new to closed

Duplicate of IZ3193.

Note: See TracTickets for help on using tickets.