[GE users] Memory leak in 6.1u2 ? (plus weekly build binaries)

Reuti reuti at staff.uni-marburg.de
Wed Jan 30 10:34:37 GMT 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Am 30.01.2008 um 10:04 schrieb Andreas.Haas at Sun.COM:

> Hi Henk,
>
> On Tue, 29 Jan 2008, SLIM H.A. wrote:
>
>> Hi Andreas
>>
>> Last Sunday night the scheduler consumed all available memory  
>> within a couple of hours and we had to restart it. We run 6.1u2  
>> and it seems similar to the problem reported here.
>> Is this supposed to be fixed in 6.1u3, issue
>> 2187     6562190   memory leak in sge_schedd in
>>
>> http://gridengine.sunsource.net/project/gridengine/61patches.txt
>>
>> Is it worthwhile to upgrade?
>
> Upgrading from 6.1u2 to 6.1u3 won't get you any improvement since  
> u2 does already contain the first part of the fix. For the second  
> part you will need 6.1u4, but that is not yet released.
>
> That means if you want the issue quickly resolved you can either  
> build a scheduler binary from the latest V61_BRANCH sources or you  
> federate with others community members and start a petition whose  
> aim were to make the weekly build of the most recent V61_BRANCH  
> sources available for download to anyone.

Great idea! - Reuti


> Although the binaries of such a build would not go through any  
> significant QA, but check-ins into V61_BRANCH anyways are  
> handpicked and
> under observation of many critical eyeballs after the peer-review.
>
> Regards,
> Andreas
>
>>
>> Thanks
>>
>> Henk
>>
>>> -----Original Message-----
>>> From: Andreas.Haas at Sun.COM [mailto:Andreas.Haas at Sun.COM]
>>> Sent: 15 January 2008 18:10
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] Memory leak in 6.1u2 ?
>>>
>>> Hi Richard,
>>>
>>> actually when I tested valgrind (version 3.2.3) with schedd I
>>> could stop it with a simple qconf -ks.
>>>
>>> Could you file an issue to
>>>
>>>     http://gridengine.sunsource.net/servlets/ProjectIssues
>>>
>>> and describe in detail the setup and the case where you
>>> observe the leak?
>>> Please add sample configurations of queues, hosts, PE,
>>> scheduler configuration and also the accounting.
>>>
>>> Regards,
>>> Andreas
>>>
>>> On Tue, 15 Jan 2008, Richard Ems wrote:
>>>
>>>> Hi list,
>>>>
>>>> the memory leak is there again, and now I'm trying to use valgrind,
>>>> but without success. 8( The problem seems to be, that I cannot make
>>>> sge_schedd to end without killing it using SIGKILL (kill -9 ...),
>>>> SIGTERM or qconf -ks don't stop the scheduler.
>>>> And it's eating up to over 3 GB memory on this 4 GB memory
>>> system and
>>>> it does not end.
>>>>
>>>> After killing it with "kill -9", I get no more output that
>>>>
>>>> =============================================================
>>>> # cat valgrind-sge_schedd-debug.out
>>>> ==2897== Memcheck, a memory error detector.
>>>> ==2897== Copyright (C) 2002-2007, and GNU GPL'd, by Julian
>>> Seward et al.
>>>> ==2897== Using LibVEX rev 1732, a library for dynamic
>>> binary translation.
>>>> ==2897== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
>>>> ==2897== Using valgrind-3.2.3, a dynamic binary
>>> instrumentation framework.
>>>> ==2897== Copyright (C) 2000-2007, and GNU GPL'd, by Julian
>>> Seward et al.
>>>> ==2897== For more details, rerun with: -v ==2897== starting up GE
>>>> 6.1u2 (lx24-amd64)
>>>> =============================================================
>>>>
>>>> in the valgrind output file, since I am killing valgrind!
>>>> Is there a way to stop the scheduler in another way without
>>> having to
>>>> kill valgrind ?
>>>>
>>>> If I don't kill it, at some point valgrind ends itself with
>>>>
>>>> =============================================================
>>>> Valgrind's memory management: out of memory:
>>>>   newSuperblock's request for 1048576 bytes failed.
>>>>   5026193408 bytes have already been allocated.
>>>> Valgrind cannot continue.  Sorry.
>>>> =============================================================
>>>>
>>>> See appended files from 2 valgrind runs.
>>>>
>>>> regards, Richard
>>>>
>>>>
>>>> --
>>>> Richard Ems       mail: Richard.Ems at Cape-Horn-Eng.com
>>>>
>>>> Cape Horn Engineering S.L.
>>>> C/ Dr. J.J. Dómine 1, 5? piso
>>>> 46011 Valencia
>>>> Tel : +34 96 3242923 / Fax 924
>>>>
>>>
>>> <°)))><
>>>
>>> http://gridengine.info/
>>>
>>> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
>>> D-85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028
>>> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr.
>>> Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> http://gridengine.info/
>
> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,  
> D-85551 Kirchheim-Heimstetten
> Amtsgericht Muenchen: HRB 161028
> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland  
> Boemer
> Vorsitzender des Aufsichtsrates: Martin Haering
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list