Custom Query (431 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (97 - 99 of 431)

Ticket Resolution Summary Owner Reporter
#360 fixed IZ2062: Memory leak in qmaster olle
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2062]

        Issue #:      2062             Platform:     All      Reporter: olle (olle)
       Component:     gridengine          OS:        Linux
     Subcomponent:    qmaster          Version:      6.0u8       CC:
                                                                        [_] reuti
                                                                        [_] uddeborg
                                                                        [_] Remove selected CCs
        Status:       REOPENED         Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: 6.0u8
      Assigned to:    ernst (ernst)
      QA Contact:     ernst
          URL:
       * Summary:     Memory leak in qmaster
   Status whiteboard:
      Attachments:

     Issue 2062 blocks:
   Votes for issue 2062:


   Opened: Wed May 24 02:53:00 -0700 2006 
------------------------


After five days of uptime our qmaster process is above 10GB and is still growing
in size.

I have no idea how to debug this on a running cluster in production, and I have
not managed to reproduce it on a smaller environment.

Any ideas are welcome.

   ------- Additional comments from uddeborg Wed May 24 06:00:42 -0700 2006 -------
The requirement to enter a comment to add yourself as a CC is a bit silly.

   ------- Additional comments from reuti Mon Sep 18 06:00:27 -0700 2006 -------
For us this seems to happen, when the accounting file reaches a certain size. Deleting the accounting file
and restarting the qmaster solved the problem apparently.

   ------- Additional comments from olle Mon Sep 18 06:30:33 -0700 2006 -------
Any idea on what size would trigger it?

We have a daily rotation of the accounting file and usually less than 30000
lines in one file.

   ------- Additional comments from andreas Mon Sep 18 07:09:04 -0700 2006 -------
I have really no idea how I could explain it. The qmaster never ever reads in
the accounting file. All qmaster does is append a line to accounting file for
each record.

Actually, if qmaster memory growth can be reproduced, it would be interesting to
see whether 'accounting_flush_time' setting in sge_conf(5) has any effect on it ...

   ------- Additional comments from joga Fri Jan 26 04:43:20 -0700 2007 -------
Has been fixed in 6.0u8.

Problem was:
qmaster buffers the accounting records, and writes the buffered data in fixed
intervals.

When closing the accounting file after writing failed (e.g. when the filesystem
was full),
the buffer was not deleted.

Beginning with 6.0u8, the buffer is always cleared, regardless if the writing
succeeded.

Of course this may lead to a data loss, if for example the filesystem is full.

   ------- Additional comments from olle Mon Feb 19 08:44:41 -0700 2007 -------
> Has been fixed in 6.0u8

I think I reported this issue on 6.0u8. It might have been introduced in earlier
versions, but it was not fixed in the courtesy binaries of version 6.0u8.
#361 invalid IZ2065: qstat -r -xml is missing entry ovid
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2065]

        Issue #:      2065             Platform:     Sun      Reporter: ovid (ovid)
       Component:     gridengine          OS:        All
     Subcomponent:    clients          Version:      6.0u4       CC:    None defined
        Status:       NEW              Priority:     P4
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    roland (roland)
      QA Contact:     roland
          URL:
       * Summary:     qstat -r -xml is missing entry
   Status whiteboard:
      Attachments:

     Issue 2065 blocks:
   Votes for issue 2065:


   Opened: Fri May 26 13:48:00 -0700 2006 
------------------------


qstat -r -xml does not have Master queue entry.

Here's a fragment of a plain qstat -r output:


sgetest@dt218-123# qstat -r
job-ID  prior   name       user         state submit/start at     queue
                 slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
   7675 0.50500 STARTED    sgetest      r     05/26/2006 13:07:38
all.q@dt218-123                    1
       Full jobname:     STARTED
       Master queue:     all.q@dt218-123
       Hard Resources:
       Soft Resources:
       Hard requested queues: all.q

.....


and here's the corresponding qstat -r -xml output:


sgetest@dt218-123# qstat -r -xml
<?xml version='1.0'?>
<job_info  xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <queue_info>
    <job_list state="running">
      <JB_job_number>7675</JB_job_number>
      <JAT_prio>0.50500</JAT_prio>
      <JB_name>STARTED</JB_name>
      <JB_owner>sgetest</JB_owner>
      <state>r</state>
      <JAT_start_time>05/26/2006 13:07:38</JAT_start_time>
      <queue_name>all.q@dt218-123</queue_name>
      <slots>1</slots>
      <hard_req_queue>all.q</hard_req_queue>
    </job_list>


Notice that there is no Master queue entry in the XML output.

   ------- Additional comments from ovid Fri May 26 14:56:48 -0700 2006 -------

It also lacks entry for Full hjobname.

qstat -r looks like this:

....

  7750 0.00000 HOLD       sgetest      hqw   05/26/2006 14:18:21
                    1
       Full jobname:     HOLD
       Hard Resources:
       Soft Resources:
       Hard requested queues: all.q


while qstat -r -xml looks like this:


......

  <job_list state="pending">
      <JB_job_number>7750</JB_job_number>
      <JAT_prio>0.00000</JAT_prio>
      <JB_name>HOLD</JB_name>
      <JB_owner>sgetest</JB_owner>
      <state>hqw</state>
      <JB_submission_time>05/26/2006 14:18:21</JB_submission_time>
      <queue_name></queue_name>
      <slots>1</slots>
      <hard_req_queue>all.q</hard_req_queue>
    </job_list>
  </job_info>
</job_info>


Note there is only one entry for JB_name, but not for Full jobname.

   ------- Additional comments from ovid Fri May 26 15:52:17 -0700 2006 -------
For Hard resource, output is inconsistent between qstat -r and
qstat -r -xml.

For qstat -r, we have:

....

 7794 0.55500 AMD64      sgetest      r     05/26/2006 15:13:28 all.q@dt218-151
                   1 5
       Full jobname:     AMD64
       Master queue:     all.q@dt218-151
       Hard Resources:   arch=lx24-amd64 (0.000000)
       Soft Resources:


while for qstat -r -xml, we have
.....

 <job_list state="running">
      <JB_job_number>7794</JB_job_number>
      <JAT_prio>0.55500</JAT_prio>
      <JB_name>AMD64</JB_name>
      <JB_owner>sgetest</JB_owner>
      <state>r</state>
      <JAT_start_time>05/26/2006 15:13:28</JAT_start_time>
      <queue_name>all.q@dt218-151</queue_name>
      <slots>1</slots>
      <tasks>5</tasks>
      <hard_request name="arch"
resource_contribution="0.000000">lx24-amd64</hard_request>
    </job_list>


Note that here we don't have the "arch=" part.


#362 fixed IZ2066: drmaa_run_bulk_jobs() input parameter data types must be consistent with 1.0 specification andreas
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2066]

        Issue #:      2066             Platform:     Sun      Reporter: andreas (andreas)
       Component:     gridengine          OS:        All
     Subcomponent:    drmaa            Version:      6.0         CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    templedf (templedf)
      QA Contact:     templedf
          URL:
       * Summary:     drmaa_run_bulk_jobs() input parameter data types must be consistent with 1.0 specification
   Status whiteboard:
      Attachments:

     Issue 2066 blocks:
   Votes for issue 2066:


   Opened: Wed May 31 07:06:00 -0700 2006 
------------------------


DESCRIPTION:
The DRMAA 1.0 specification defines drmaa_run_bulk_jobs parameters as:

  start, end - unsigned integer
  incr - signed integer

in DRMAA 0.95 binding 'start' and 'end' are signed integer.
Note: See TracQuery for help on using queries.