[GE users] Memory Leak in schedd 6.1u4?

Mulley, Nikhil Nikhil.Mulley at deshaw.com
Sun Apr 13 13:17:39 BST 2008


No Deal for me :-/

Scheduler died again on the system even after setting schedd_job_info to
false. 

-----Original Message-----
From: Brian Smith [mailto:brs at usf.edu] 
Sent: Friday, April 11, 2008 7:50 PM
To: users at gridengine.sunsource.net
Cc: Andreas.Haas at Sun.COM
Subject: Re: [GE users] Memory Leak in schedd 6.1u4?

I suppose that in yet another oft cited case of software "voodoo", where

problems appear to magically disappear, the leak appears to have stopped

or at least slowed by many orders of magnitude with the mallinfo() 
binary (at least on my system).  As I reviewed the patch that Andreas 
sent me, I can attest to the unlikelihood that this particular patch had

anything to do with "fixing" a memory leak.  The only two things I can 
think of:

1) (obvious question) Is the source for schedd that this patch was built

against the same revision as that released with 6.1u4?
2) Did I change anything on my system?  Are conditions similar? (No... I

even left schedd_job_info enabled... and Yes... in fact, we're queuing 
ten times the number of jobs we were the day after the initial upgrade 
where we saw schedd consume all ram in a matter of < 12hrs)

I've restarted schedd w/ mallinfo several times to no avail... no 
leaks.  I'll go back to the original binary and see if it behaves 
similarly. 

-Brian

Daniel Templeton wrote:
> No restart required.
>
> Daniel
>
> Mulley, Nikhil wrote:
>> Roland, wondering if this parameter change in the scheduler
>> configuration would ask for any scheduler(qmaster+scheduler) restart?
>> -----Original Message-----
>> From: Roland.Dittel at Sun.COM [mailto:Roland.Dittel at Sun.COM] Sent: 
>> Wednesday, April 09, 2008 7:26 PM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Memory Leak in schedd 6.1u4?
>>
>> Nikhil,
>>
>> Andreas and me believe the culprit is schedd_job_info in the 
>> scheduler config. Can you please set it to false and give it a try?
>>
>> Best regards
>> Roland
>>
>> Mulley, Nikhil wrote:
>>  
>>> Andreas, while you are with Linux binary, could you please provide
me
>>>     
>> a
>>  
>>> sol-amd64 binary with mallinfo/or whatever logging on Solaris
>>>     
>> platform?
>>  
>>> I can chip in to provide any other necessary information.
>>>
>>> Thanks,
>>> Nikhil
>>>
>>> -----Original Message-----
>>> From: Andreas.Haas at Sun.COM [mailto:Andreas.Haas at Sun.COM] Sent: 
>>> Monday, April 07, 2008 12:51 PM
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] Memory Leak in schedd 6.1u4?
>>>
>>> Hi Brian,
>>>
>>> from your mentioning of
>>>
>>>     http://linux-mm.org/OOM_Killer
>>>
>>> I conclude you run into this under a Linux distribution. If it is
>>> RHEL4/amd64 I have a ready-built binary that comes with mallinfo(3) 
>>> logging and
>>>     
>> I'm
>>  
>>> confident this will us help to find the evildoer.
>>>
>>> Regards,
>>> Andreas
>>>
>>>
>>> On Sun, 6 Apr 2008, Brian Smith wrote:
>>>
>>>    
>>>> Just restarted schedd with valgrind... I should have output the
next
>>>>       
>>> time the    
>>>> oom_killer kicks in. Are the "retail" binaries built with debugging
>>>>       
>>> symbols    
>>>> or will I need to build my own schedd?
>>>>
>>>> -Brian
>>>>
>>>> Chris Dagdigian wrote:
>>>>      
>>>>> Hi Brian,
>>>>>
>>>>> People have been reporting schedd leaks in the 6.1u series - some
>>>>>         
>>> problems    
>>>>> were found and fixed but there are indications on the user list
that
>>>>>         
>>> the    
>>>>> problem still remains. Multiple people have been trying to track
>>>>>         
>> down
>>  
>>> the    
>>>>> cause -- any additional eyeballs and details will undoubtably be
>>>>>         
>>> welcome,    
>>>>> especially if you can run under valgrind or other tools that may
>>>>>         
>> help
>>  
>>> trace    
>>>>> down the offending code.
>>>>>
>>>>> -Chris
>>>>>
>>>>>
>>>>> On Apr 6, 2008, at 1:11 PM, Brian Smith wrote:
>>>>>        
>>>>>> Has anyone else noticed a memory leak with 6.1u4? oom-killer is
>>>>>>           
>>> stopping    
>>>>>> my sge_schedd because its wolfing down gobs of ram. The box I'm
>>>>>>           
>>> running on    
>>>>>> has 4GB and is used only for nis, qmaster/schedd, and managing 
>>>>>> and provisioning the cluster. I'll post more details if desired.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> -Brian
>>>>>>           
>>>>>         
>> ---------------------------------------------------------------------
>>  
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail:
users-help at gridengine.sunsource.net
>>>>>         
>>>>
---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail:
users-help at gridengine.sunsource.net
>>>>
>>>>
>>>>       
>>> http://gridengine.info/
>>>
>>> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551
>>> Kirchheim-Heimstetten
>>> Amtsgericht Muenchen: HRB 161028
>>> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland
>>>     
>> Boemer
>>  
>>> Vorsitzender des Aufsichtsrates: Martin Haering
>>>
>>>
---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>     
>>
>>
>>   
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list