Custom Query (431 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (76 - 78 of 431)

Ticket Resolution Summary Owner Reporter
#360 fixed IZ2062: Memory leak in qmaster olle
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2062]

        Issue #:      2062             Platform:     All      Reporter: olle (olle)
       Component:     gridengine          OS:        Linux
     Subcomponent:    qmaster          Version:      6.0u8       CC:
                                                                        [_] reuti
                                                                        [_] uddeborg
                                                                        [_] Remove selected CCs
        Status:       REOPENED         Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: 6.0u8
      Assigned to:    ernst (ernst)
      QA Contact:     ernst
          URL:
       * Summary:     Memory leak in qmaster
   Status whiteboard:
      Attachments:

     Issue 2062 blocks:
   Votes for issue 2062:


   Opened: Wed May 24 02:53:00 -0700 2006 
------------------------


After five days of uptime our qmaster process is above 10GB and is still growing
in size.

I have no idea how to debug this on a running cluster in production, and I have
not managed to reproduce it on a smaller environment.

Any ideas are welcome.

   ------- Additional comments from uddeborg Wed May 24 06:00:42 -0700 2006 -------
The requirement to enter a comment to add yourself as a CC is a bit silly.

   ------- Additional comments from reuti Mon Sep 18 06:00:27 -0700 2006 -------
For us this seems to happen, when the accounting file reaches a certain size. Deleting the accounting file
and restarting the qmaster solved the problem apparently.

   ------- Additional comments from olle Mon Sep 18 06:30:33 -0700 2006 -------
Any idea on what size would trigger it?

We have a daily rotation of the accounting file and usually less than 30000
lines in one file.

   ------- Additional comments from andreas Mon Sep 18 07:09:04 -0700 2006 -------
I have really no idea how I could explain it. The qmaster never ever reads in
the accounting file. All qmaster does is append a line to accounting file for
each record.

Actually, if qmaster memory growth can be reproduced, it would be interesting to
see whether 'accounting_flush_time' setting in sge_conf(5) has any effect on it ...

   ------- Additional comments from joga Fri Jan 26 04:43:20 -0700 2007 -------
Has been fixed in 6.0u8.

Problem was:
qmaster buffers the accounting records, and writes the buffered data in fixed
intervals.

When closing the accounting file after writing failed (e.g. when the filesystem
was full),
the buffer was not deleted.

Beginning with 6.0u8, the buffer is always cleared, regardless if the writing
succeeded.

Of course this may lead to a data loss, if for example the filesystem is full.

   ------- Additional comments from olle Mon Feb 19 08:44:41 -0700 2007 -------
> Has been fixed in 6.0u8

I think I reported this issue on 6.0u8. It might have been introduced in earlier
versions, but it was not fixed in the courtesy binaries of version 6.0u8.
#362 fixed IZ2066: drmaa_run_bulk_jobs() input parameter data types must be consistent with 1.0 specification andreas
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2066]

        Issue #:      2066             Platform:     Sun      Reporter: andreas (andreas)
       Component:     gridengine          OS:        All
     Subcomponent:    drmaa            Version:      6.0         CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    templedf (templedf)
      QA Contact:     templedf
          URL:
       * Summary:     drmaa_run_bulk_jobs() input parameter data types must be consistent with 1.0 specification
   Status whiteboard:
      Attachments:

     Issue 2066 blocks:
   Votes for issue 2066:


   Opened: Wed May 31 07:06:00 -0700 2006 
------------------------


DESCRIPTION:
The DRMAA 1.0 specification defines drmaa_run_bulk_jobs parameters as:

  start, end - unsigned integer
  incr - signed integer

in DRMAA 0.95 binding 'start' and 'end' are signed integer.
#367 fixed IZ2073: qstat -f -qs E broken ovid
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2073]

        Issue #:      2073             Platform:     Sun      Reporter: ovid (ovid)
       Component:     gridengine          OS:        All
     Subcomponent:    clients          Version:      6.0u4       CC:    None defined
        Status:       NEW              Priority:     P4
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    roland (roland)
      QA Contact:     roland
          URL:
       * Summary:     qstat -f -qs E broken
   Status whiteboard:
      Attachments:

     Issue 2073 blocks:
   Votes for issue 2073:


   Opened: Wed Jun 14 09:00:00 -0700 2006 
------------------------


qstat -f -qs E is totally broken.

I get the following:

sgetest@dt218-123# qstat -f -qs E
sgetest@dt218-123# qstat -f -qs E -xml


But, without the -f flag, it seems OK

sgetest@dt218-123# qstat -qs E
job-ID  prior   name       user         state submit/start at     queue
                 slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  11589 0.50500 ERROR      sgetest      Eqw   06/14/2006 06:57:45
                     1
  11590 0.00000 HOLD       sgetest      hqw   06/14/2006 06:57:45
                     1
sgetest@dt218-123# qstat -qs E -xml
<?xml version='1.0'?>
<job_info  xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <job_info>
    <job_list state="pending">
      <JB_job_number>11589</JB_job_number>
      <JAT_prio>0.50500</JAT_prio>
      <JB_name>ERROR</JB_name>
      <JB_owner>sgetest</JB_owner>
      <state>Eqw</state>
      <JB_submission_time>06/14/2006 06:57:45</JB_submission_time>
      <queue_name></queue_name>
      <slots>1</slots>
    </job_list>
    <job_list state="pending">
      <JB_job_number>11590</JB_job_number>
      <JAT_prio>0.00000</JAT_prio>
      <JB_name>HOLD</JB_name>
      <JB_owner>sgetest</JB_owner>
      <state>hqw</state>
      <JB_submission_time>06/14/2006 06:57:45</JB_submission_time>
      <queue_name></queue_name>
      <slots>1</slots>
    </job_list>
  </job_info>
</job_info>

   ------- Additional comments from roland Thu Nov 16 01:06:35 -0700 2006 -------
The comments section is incorrect because it mixes job error with queue error.
The "qstat -qs E" switch is a queue filter and shows all queues in error state
AND all pending jobs. The "qstat -f -qs E" should print the same output as
"qstat -qs E". This is not the case, it prints nothing.

My suspicion is qstat -f recognize no queues are selected and then print's no
output. This is wrong because it has to print the pending jobs.
Note: See TracQuery for help on using queries.