[GE issues] [Issue 3121] New - sge clients/daemons segfault when bootstrap file is corrupted

templedf dan.templeton at sun.com
Thu Aug 27 16:04:54 BST 2009


http://gridengine.sunsource.net/issues/show_bug.cgi?id=3121
                 Issue #|3121
                 Summary|sge clients/daemons segfault when bootstrap file is co
                        |rrupted
               Component|gridengine
                 Version|6.0
                Platform|All
                     URL|
              OS/Version|All
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P4
            Subcomponent|kernel
             Assigned to|andreas
             Reported by|templedf






------- Additional comments from templedf at sunsource.net Thu Aug 27 08:04:51 -0700 2009 -------
When the bootstrap file is corrupted, i.e sge cannot read the key-value pairs of this file, it just segfaults.

This is a simple test where the bootstrap file is empty:

 $  qstat
 error: fopen("/sge/test/BUILD/default/common/bootstrap") failed: No such file or directory

 $  touch default/common/bootstrap
 $ qstat
 Segmentation Fault (core dumped)

 $  file core
core:           ELF 32-bit LSB core file 80386 Version 1, from 'qstat'

 $  pstack core
core 'core' of 9722:    /sge/test/BUILD/bin/sol-x86/qstat
 d0dc4385 memchr   (823c27f, 80435e8, 8043160, 0) + 55
 d0e1d22b vsnprintf (80431a0, 400, 823c244, 80435e8) + 73
 08183fb1 sge_dstring_vsprintf_copy_append (8043a60, 8184428, 823c244, 80435e8) + 5d
 0818439e sge_dstring_sprintf (8043a60, 823c244, 81fefd8, 1, 82781a0) + 36
 08195983 sge_get_confval_array (82781a0, e, 9, 8043a70, 8043ae0, 8043a60) + 237
 081818dc sge_bootstrap_state_setup (82c0220, 82c0198, 8278150) + 1f0
 081810c9 sge_bootstrap_state_class_create (82c0198, 8278150) + 8d
 080c775e sge_gdi_ctx_setup (82ae068, d, 8202f0c, 0, 8202e5c, 8047480) + fa
 080c6f2d sge_gdi_ctx_class_create (d, 8202f0c, 0, 8202e5c, 8047480, 8047500) + 571
 080cc977 sge_setup2 (80477f4, d, 0, 80477d4, 0) + 43f
 080ccda3 sge_gdi2_setup (80477f4, d, 0, 80477d4) + 127
 0806cbc6 main     (1, 8047830, 8047838) + ee
 0806ca4a _start   (1, 8047904, 0, 804793c, 804794a, 804795e) + 7a

It should exit gracefully with some valid error message.

This was due to a coding error, where the format expected a string and struct ( bootstrap_entry_t) was passed as argument to sge_dstring_sprintf
Otherwise this case was aptly handled.
This was introduced in 6.0, while changing the method signature of sge_dstring_sprintf to take bootstrap_entry_t[]  instead of string[]

Fix is simple:

$ cvs diff -u libs/uti/sge_spool.c
Index: libs/uti/sge_spool.c
===================================================================
RCS file: /cvs/gridengine/source/libs/uti/sge_spool.c,v
retrieving revision 1.24
diff -u -r1.24 sge_spool.c
--- libs/uti/sge_spool.c        29 Apr 2009 08:17:09 -0000      1.24
+++ libs/uti/sge_spool.c        27 Aug 2009 12:53:40 -0000
@@ -647,7 +647,7 @@
             }
             else {
                sge_dstring_sprintf(error_dstring, MSG_UTI_CANNOTLOCATEATTRIBUTE_SS,
-                                   name[i], fname);
+                                   name[i].name, fname);
             }

             break;

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=214571

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list