[GE users] Memory leak in 6.1u2 ?

Andy Schwierskott andy.schwierskott at sun.com
Thu Nov 15 13:21:39 GMT 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Richard,

from what you describe below it seems that NAGIOS somehow seems to monitor
and manage SGE. That sounds quite interesting. Can you tell more about it?

Regarding the potential memory leak: do you have the possibility to run the
SGE qmaster node on a different platform than openSUSE 10.3, ideally a
somewhat older release? I'm asking because in principle there could be a
memory leak or memory allocation problem in one of the system libraries as
well.

Regards,
Andy





On Thu, 15 Nov 2007, Andreas.Haas at Sun.COM wrote:

> Hi Richard,
>
> in 6.u1 we fixed a scheduler memory leak
>
>   http://gridengine.sunsource.net/project/gridengine/61patches.txt
>
> but since then I'm not aware about scheduler leaking.
>
> As for hunting the leak I could provide you with an instrumented
> scheduler binary that records allocation/deallocation information
> in a way that it can be analyzed with Linux mtrace(3) utility.
>
> Regards,
> Andreas
>
>
> On Thu, 15 Nov 2007, Richard Ems wrote:
>
>> Hi!
>> 
>> There seems to be a memory leak in 6.1u2.
>> We are having this problem several times a day. (NAGIOS is now
>> restarting the master/scheduler process several times daily!)
>> 
>> Our config:
>> 
>> 1. SGE 6.1u2 on master and nodes.
>> 
>> 2. openSUSE 10.3 64 bit on master and nodes.
>> 
>> 3. Many queues and parallel environments! We have one queue and parallel
>> environment per node, see
>> http://gridengine.info/articles/2006/02/14/grouping-jobs-to-nodes-via-wildcard-pes
>> )
>> 
>> The problem:
>> The scheduler takes the whole system memory, going up to between 3 or 4 
>> GBs.
>> 
>> 
>> There are not too many jobs, less than 100 waiting, less than 100 running.
>> 
>> Where can I look for? What other data do you need for searching/debugging?
>> 
>> 
>> Many thanks, Richard
>> 
>> 
>> -- 
>> Richard Ems       mail: Richard.Ems at Cape-Horn-Eng.com
>> 
>> Cape Horn Engineering S.L.
>> C/ Dr. J.J. Dómine 1, 5? piso
>> 46011 Valencia
>> Tel : +34 96 3242923 / Fax 924
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> 
>> 
>
> <°)))><
>
> http://gridengine.info/
>
> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 
> Kirchheim-Heimstetten
> Amtsgericht Muenchen: HRB 161028
> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
> Vorsitzender des Aufsichtsrates: Martin Haering
>
>


    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list