Custom Query (431 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (154 - 156 of 431)

Ticket Resolution Summary Owner Reporter
#532 fixed IZ2628: Tasks held with array dependency may get deleted prematurely johna
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2628]

        Issue #:      2628             Platform:     All       Reporter: johna (johna)
       Component:     gridengine          OS:        All
     Subcomponent:    qmaster          Version:      6.2beta      CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    ernst (ernst)
      QA Contact:     ernst
          URL:
       * Summary:     Tasks held with array dependency may get deleted prematurely
   Status whiteboard:
      Attachments:

     Issue 2628 blocks:
   Votes for issue 2628:


   Opened: Tue Jun 24 21:25:00 -0700 2008 
------------------------


It seems to be that tasks in the JB_ja_a_h_ids hold range can get ignored,
leading to the parent job being deleted before they are scheduled to run.

This bug does not appear in the ARI branch and seems to only occur when the
dependent job held with -hold_jid_ad option has higher priority. This probably
means that the QA testing procedure does not detect this issue since it probably
does not submit the jobs with different priority.

This can be reproduced as follows (aimk options are '-spool-classic -parallel 3
-no-dump -debug -no-secure -no-jni -no-java'):

[root@xen-grid1 johna]# qsub -t 1-10 -p -100 -b y /bin/sleep 20
Your job-array 1.1-10:1 ("sleep") has been submitted
[root@xen-grid1 johna]# qsub -t 1-10 -p 100 -hold_jid_ad 1 -b y /bin/sleep 20
Your job-array 2.1-10:1 ("sleep") has been submitted
[root@xen-grid1 johna]# qstat
job-ID  prior   name       user         state submit/start at     queue
                 slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
      1 0.50617 sleep      root         r     06/25/2008 12:52:18
all.q@xen-grid1.rsp.com.au         1 1
      1 0.00000 sleep      root         qw    06/25/2008 12:52:13
                     1 2-10:1
      2 0.00000 sleep      root         hqw   06/25/2008 12:52:20
                     1 1-10:1
[root@xen-grid1 johna]# qstat
job-ID  prior   name       user         state submit/start at     queue
                 slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
      1 0.50617 sleep      root         qw    06/25/2008 12:52:13
                     1 2-10:1
      2 0.00000 sleep      root         qw    06/25/2008 12:52:20
                     1 1
      2 0.00000 sleep      root         hqw   06/25/2008 12:52:20
                     1 2-10:1
[root@xen-grid1 johna]# qstat
job-ID  prior   name       user         state submit/start at     queue
                 slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
      2 0.60383 sleep      root         r     06/25/2008 12:52:48
all.q@xen-grid1.rsp.com.au         1 1
      1 0.50617 sleep      root         qw    06/25/2008 12:52:13
                     1 2-10:1
      2 0.00000 sleep      root         hqw   06/25/2008 12:52:20
                     1 2-10:1
[root@xen-grid1 johna]# qstat
job-ID  prior   name       user         state submit/start at     queue
                 slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
      2 0.00000 sleep      root         hqw   06/25/2008 12:52:20
                     1 2-10:1
      1 0.50617 sleep      root         qw    06/25/2008 12:52:13
                     1 2-10:1
[root@xen-grid1 johna]# qstat
job-ID  prior   name       user         state submit/start at     queue
                 slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
      1 0.50617 sleep      root         r     06/25/2008 12:53:18
all.q@xen-grid1.rsp.com.au         1 2
      1 0.50617 sleep      root         qw    06/25/2008 12:52:13
                     1 3-10:1

End result, job 2 is "gone" despite it having some tasks left that are held with
AD. A preliminary investigation on MT has found some missing code lines in
sge_job_qmaster.c, but I have not as yet been able to isolate this defect.
#535 fixed IZ2633: memory leak after sge_peopen() in AFS/DCE/KERBEROS code andreas
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2633]

        Issue #:      2633             Platform:     All      Reporter: andreas (andreas)
       Component:     gridengine          OS:        All
     Subcomponent:    qmaster          Version:      6.1u4       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    ernst (ernst)
      QA Contact:     ernst
          URL:
       * Summary:     memory leak after sge_peopen() in AFS/DCE/KERBEROS code
   Status whiteboard:
      Attachments:

     Issue 2633 blocks:
   Votes for issue 2633:


   Opened: Wed Jun 25 08:46:00 -0700 2008 
------------------------


The AFS/DCE/KERBEROS code in libs/gdi/sge_security.c leaks memory.
Each time a sge_peopen() is done as to launch one of the script plug-in
procedures, sge_bin2string() allocates memory that is not free()'d later.

I'm filing this bug against qmaster because some of the procedures are launched
by qmaster.
#536 fixed IZ2635: Fails to build libs/uti/sge_getloadavg.c with gcc 4.3.1 paulmillar
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2635]

        Issue #:      2635             Platform:     All       Reporter: paulmillar (paulmillar)
       Component:     gridengine          OS:        All
     Subcomponent:    build            Version:      current      CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
          URL:
       * Summary:     Fails to build libs/uti/sge_getloadavg.c with gcc 4.3.1
   Status whiteboard:
      Attachments:
                      Date/filename:                           Description:                                           Submitted by:
                      Wed Jul 2 01:53:00 -0700 2008: 2635.diff A source diff that should fix the problem (text/plain) andreas
                      Wed Jul 2 01:59:00 -0700 2008: 2635.diff New try (former diff was buggy) (text/plain)           andreas

     Issue 2635 blocks:
   Votes for issue 2635:


   Opened: Tue Jul 1 16:36:00 -0700 2008 
------------------------


I tried to build gridengine from cvs on Debian sid.  The current HEAD failed
whilst building the libraries, in particular with libs/uti/sge_getloadavg.c.
The problem was with line 1306 and I've copied the output below:


_________C_O_R_E__S_Y_S_T_E_M_____________
gcc -O3 -Wall -Werror -Wstrict-prototypes -D__GRIDENGINE_FD_SETSIZE=8192 -DLINUX
-DLINUX86 -DLINUX86_26 -D_GNU_SOURCE -DGETHOSTBYNAME_R6 -DGETHOSTBYADDR_R8
-DLOAD_OPENSSL -I/vol2/SW/db-4.4.20/lx26-x86/include/ -DSGE_ARCH_STRING=lx26-x86
-DTARGET_32BIT  -DSPOOLING_dynamic -DSECURE
-I/vol2/tools/SW/openssl-0.9.8g-origin/lx26-x86/include -Wno-strict-aliasing
-D_FILE_OFFSET_BITS=64 -DCOMPILE_DC -D__SGE_COMPILE_WITH_GETTEXT__
-D__SGE_NO_USERMAPPING__ -I../common -I../libs -I../libs/uti -I../libs/juti
-I../libs/gdi -I../libs/japi -I../libs/sgeobj -I../libs/cull -I../libs/rmon
-I../libs/comm -I../libs/comm/lists -I../libs/sched -I../libs/evc -I../libs/evm
-I../libs/mir -I../libs/lck -I../daemons/common -I../daemons/qmaster
-I../daemons/execd -I../daemons/schedd -I../clients/common -I.
-I/usr/lib/jvm/java-6-sun/include -I/usr/lib/jvm/java-6-sun/include/linux  -fPIC
-c ../libs/uti/sge_getloadavg.c
cc1: warnings being treated as errors
../libs/uti/sge_getloadavg.c: In function 'get_cpu_load':
../libs/uti/sge_getloadavg.c:1306: error: array subscript is above array bounds
../libs/uti/sge_getloadavg.c:1306: error: array subscript is above array bounds
make: *** [sge_getloadavg.o] Error 1

I've not traced the logic of the function, but the code doesn't pass the "sniff
test".  I've copied a patch that fixes this issue, allowing the compilation to
progress, although it failed later on.


Index: libs/uti/sge_getloadavg.c
===================================================================
RCS file: /cvs/gridengine/source/libs/uti/sge_getloadavg.c,v
retrieving revision 1.38
diff -u -r1.38 sge_getloadavg.c
--- libs/uti/sge_getloadavg.c   15 Apr 2008 12:40:54 -0000      1.38
+++ libs/uti/sge_getloadavg.c   1 Jul 2008 23:19:22 -0000
@@ -1302,10 +1302,11 @@
    /* calculate percentages based on overall change, rounding up */
    half_total = total_change / 2l;
    for (i = 0; i < cnt; i++) {
-      *out = ((double)((*diffs++ * 1000 + half_total) / total_change))/10;
+      *out = ((double)((*diffs * 1000 + half_total) / total_change))/10;
       DPRINTF(("diffs: %lu half_total: %lu total_change: %lu -> %f",
             *diffs, half_total, total_change, *out));
       out++;
+      diffs++;
    }

    DEXIT;


Naturally, someone who understands the precise semantics of this function should
review the patch.

Cheers,

Paul.

PS. Can one attach patches to to bug with this issue-tracker?  I'd guess that
posting patches in-line is fragile.

   ------- Additional comments from andreas Wed Jul 2 01:53:01 -0700 2008 -------
Created an attachment (id=176)
A source diff that should fix the problem

   ------- Additional comments from andreas Wed Jul 2 01:59:21 -0700 2008 -------
Created an attachment (id=177)
New try (former diff was buggy)

   ------- Additional comments from andreas Wed Jul 2 02:02:08 -0700 2008 -------
Paul,

could you try the second diff that I attached to this issue and let me know the
result?

Note, the first one was buggy, since DPRINTF expressions are evaluated in
monitoring mode only. For that reason increments must be done outside the
DPRINTF statements.

Regards,
Andreas

   ------- Additional comments from paulmillar Wed Jul 2 14:48:01 -0700 2008 -------
Hi Andreas,

Thanks for looking into this.

Both patches look broken to me.  The first patch *only* increments the two ptrs
inside the DPRINTF, which (as you say) is broken if monitoring is switched off;
the second patch increments both inside and outside the DPRINT, which is broken
if monitoring is switched on!

Could you have another look at my patch?  I still believe this is the correct fix.

Cheers,

Paul.

PS.  Is it possible to use unified output for diffs ("cvs diff -u")?  I find
these easier to read.

   ------- Additional comments from paulmillar Mon Jul 14 17:06:04 -0700 2008 -------
Hi Andreas,

A couple of updates on this issue:

The first point is I've tried the second version of the patch, as you recommended.

At first it seemed to work; however, I was concerned that the compiler was
somehow factoring out the DPRINTF macro (hence the diffs++ and out++ within the
DPRINTF macro are never evaluated).  This would hide the problem until someone
attempts to compile with an enabled DPRINTF.  To test this, I replaced the
DPRINTF macro with a simple printf and the compilation broke again:

gcc -O3 -Wall -Werror -Wstrict-prototypes -D__GRIDENGINE_FD_SETSIZE=8192
[... many more arguments ...]
-I/usr/lib/jvm/java-6-sun-1.6.0.07/jre/include/linux  -fPIC -c
../libs/uti/sge_getloadavg.c
cc1: warnings being treated as errors
../libs/uti/sge_getloadavg.c: In function 'get_cpu_load':
../libs/uti/sge_getloadavg.c:1305: error: array subscript is above array bounds
../libs/uti/sge_getloadavg.c:1305: error: array subscript is above array bounds
../libs/uti/sge_getloadavg.c:1305: error: array subscript is above array bounds
../libs/uti/sge_getloadavg.c:1305: error: array subscript is above array bounds
make: *** [sge_getloadavg.o] Error 1

I believe this demonstrates that the second version (of the patch) is still
broken --- although I know why DPRINTF is not having any affect: is DPRINTF not
available for the uti library?

The second point is that I've just noticed that there's a function called
percentages_new() in the same file that is similar to percentages().

The patches so far only fix percentages() and not percentages_new().  The latter
looks to have the same problem as the former, but was not picked up by gcc as
the code is wrapped by some preprocessor tests for (I believe) compilation
architecture.

HTH,

Paul.
Note: See TracQuery for help on using queries.