[GE users] Memory leak in 6.1u2 ?

SLIM H.A. h.a.slim at durham.ac.uk
Tue Jan 29 17:09:10 GMT 2008


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Andreas 

Last Sunday night the scheduler consumed all available memory within a couple of hours and we had to restart it. We run 6.1u2 and it seems similar to the problem reported here.
Is this supposed to be fixed in 6.1u3, issue
2187     6562190   memory leak in sge_schedd in

http://gridengine.sunsource.net/project/gridengine/61patches.txt

Is it worthwhile to upgrade?

Thanks

Henk

> -----Original Message-----
> From: Andreas.Haas at Sun.COM [mailto:Andreas.Haas at Sun.COM] 
> Sent: 15 January 2008 18:10
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Memory leak in 6.1u2 ?
> 
> Hi Richard,
> 
> actually when I tested valgrind (version 3.2.3) with schedd I 
> could stop it with a simple qconf -ks.
> 
> Could you file an issue to
> 
>     http://gridengine.sunsource.net/servlets/ProjectIssues
> 
> and describe in detail the setup and the case where you 
> observe the leak?
> Please add sample configurations of queues, hosts, PE, 
> scheduler configuration and also the accounting.
> 
> Regards,
> Andreas
> 
> On Tue, 15 Jan 2008, Richard Ems wrote:
> 
> > Hi list,
> >
> > the memory leak is there again, and now I'm trying to use valgrind, 
> > but without success. 8( The problem seems to be, that I cannot make 
> > sge_schedd to end without killing it using SIGKILL (kill -9 ...), 
> > SIGTERM or qconf -ks don't stop the scheduler.
> > And it's eating up to over 3 GB memory on this 4 GB memory 
> system and 
> > it does not end.
> >
> > After killing it with "kill -9", I get no more output that
> >
> > =============================================================
> > # cat valgrind-sge_schedd-debug.out
> > ==2897== Memcheck, a memory error detector.
> > ==2897== Copyright (C) 2002-2007, and GNU GPL'd, by Julian 
> Seward et al.
> > ==2897== Using LibVEX rev 1732, a library for dynamic 
> binary translation.
> > ==2897== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
> > ==2897== Using valgrind-3.2.3, a dynamic binary 
> instrumentation framework.
> > ==2897== Copyright (C) 2000-2007, and GNU GPL'd, by Julian 
> Seward et al.
> > ==2897== For more details, rerun with: -v ==2897== starting up GE 
> > 6.1u2 (lx24-amd64) 
> > =============================================================
> >
> > in the valgrind output file, since I am killing valgrind!
> > Is there a way to stop the scheduler in another way without 
> having to 
> > kill valgrind ?
> >
> > If I don't kill it, at some point valgrind ends itself with
> >
> > =============================================================
> > Valgrind's memory management: out of memory:
> >   newSuperblock's request for 1048576 bytes failed.
> >   5026193408 bytes have already been allocated.
> > Valgrind cannot continue.  Sorry.
> > =============================================================
> >
> > See appended files from 2 valgrind runs.
> >
> > regards, Richard
> >
> >
> > -- 
> > Richard Ems       mail: Richard.Ems at Cape-Horn-Eng.com
> >
> > Cape Horn Engineering S.L.
> > C/ Dr. J.J. Dómine 1, 5? piso
> > 46011 Valencia
> > Tel : +34 96 3242923 / Fax 924
> >
> 
> <°)))><
> 
> http://gridengine.info/
> 
> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, 
> D-85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028
> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. 
> Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list