Opened 13 years ago

Closed 5 years ago

#360 closed defect (fixed)

IZ2062: Memory leak in qmaster

Reported by: olle Owned by:
Priority: normal Milestone:
Component: sge Version: 6.0u8
Severity: minor Keywords: Linux qmaster
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2062]

        Issue #:      2062             Platform:     All      Reporter: olle (olle)
       Component:     gridengine          OS:        Linux
     Subcomponent:    qmaster          Version:      6.0u8       CC:
                                                                        [_] reuti
                                                                        [_] uddeborg
                                                                        [_] Remove selected CCs
        Status:       REOPENED         Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: 6.0u8
      Assigned to:    ernst (ernst)
      QA Contact:     ernst
          URL:
       * Summary:     Memory leak in qmaster
   Status whiteboard:
      Attachments:

     Issue 2062 blocks:
   Votes for issue 2062:


   Opened: Wed May 24 02:53:00 -0700 2006 
------------------------


After five days of uptime our qmaster process is above 10GB and is still growing
in size.

I have no idea how to debug this on a running cluster in production, and I have
not managed to reproduce it on a smaller environment.

Any ideas are welcome.

   ------- Additional comments from uddeborg Wed May 24 06:00:42 -0700 2006 -------
The requirement to enter a comment to add yourself as a CC is a bit silly.

   ------- Additional comments from reuti Mon Sep 18 06:00:27 -0700 2006 -------
For us this seems to happen, when the accounting file reaches a certain size. Deleting the accounting file
and restarting the qmaster solved the problem apparently.

   ------- Additional comments from olle Mon Sep 18 06:30:33 -0700 2006 -------
Any idea on what size would trigger it?

We have a daily rotation of the accounting file and usually less than 30000
lines in one file.

   ------- Additional comments from andreas Mon Sep 18 07:09:04 -0700 2006 -------
I have really no idea how I could explain it. The qmaster never ever reads in
the accounting file. All qmaster does is append a line to accounting file for
each record.

Actually, if qmaster memory growth can be reproduced, it would be interesting to
see whether 'accounting_flush_time' setting in sge_conf(5) has any effect on it ...

   ------- Additional comments from joga Fri Jan 26 04:43:20 -0700 2007 -------
Has been fixed in 6.0u8.

Problem was:
qmaster buffers the accounting records, and writes the buffered data in fixed
intervals.

When closing the accounting file after writing failed (e.g. when the filesystem
was full),
the buffer was not deleted.

Beginning with 6.0u8, the buffer is always cleared, regardless if the writing
succeeded.

Of course this may lead to a data loss, if for example the filesystem is full.

   ------- Additional comments from olle Mon Feb 19 08:44:41 -0700 2007 -------
> Has been fixed in 6.0u8

I think I reported this issue on 6.0u8. It might have been introduced in earlier
versions, but it was not fixed in the courtesy binaries of version 6.0u8.

Change History (2)

comment:1 Changed 5 years ago by Dave Love <d.love@…>

In 4735/sge:

Remove CCT_job_messages element and dependent code
The lists are set but never got. This fixed an instance of an occasional
qmaster space leak, probably responsible for both open issues.
Refs #360, #682.

comment:2 Changed 5 years ago by dlove

  • Resolution set to fixed
  • Severity set to minor
  • Status changed from new to closed

Assume it's a duplicate of #682.

Note: See TracTickets for help on using tickets.