[GE users] Memory Leak in schedd 6.1u4?

Brian Smith brs at usf.edu
Fri Apr 11 15:19:42 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I suppose that in yet another oft cited case of software "voodoo", where 
problems appear to magically disappear, the leak appears to have stopped 
or at least slowed by many orders of magnitude with the mallinfo() 
binary (at least on my system).  As I reviewed the patch that Andreas 
sent me, I can attest to the unlikelihood that this particular patch had 
anything to do with "fixing" a memory leak.  The only two things I can 
think of:

1) (obvious question) Is the source for schedd that this patch was built 
against the same revision as that released with 6.1u4?
2) Did I change anything on my system?  Are conditions similar? (No... I 
even left schedd_job_info enabled... and Yes... in fact, we're queuing 
ten times the number of jobs we were the day after the initial upgrade 
where we saw schedd consume all ram in a matter of < 12hrs)

I've restarted schedd w/ mallinfo several times to no avail... no 
leaks.  I'll go back to the original binary and see if it behaves 
similarly. 

-Brian

Daniel Templeton wrote:
> No restart required.
>
> Daniel
>
> Mulley, Nikhil wrote:
>> Roland, wondering if this parameter change in the scheduler
>> configuration would ask for any scheduler(qmaster+scheduler) restart?
>> -----Original Message-----
>> From: Roland.Dittel at Sun.COM [mailto:Roland.Dittel at Sun.COM] Sent: 
>> Wednesday, April 09, 2008 7:26 PM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Memory Leak in schedd 6.1u4?
>>
>> Nikhil,
>>
>> Andreas and me believe the culprit is schedd_job_info in the 
>> scheduler config. Can you please set it to false and give it a try?
>>
>> Best regards
>> Roland
>>
>> Mulley, Nikhil wrote:
>>  
>>> Andreas, while you are with Linux binary, could you please provide me
>>>     
>> a
>>  
>>> sol-amd64 binary with mallinfo/or whatever logging on Solaris
>>>     
>> platform?
>>  
>>> I can chip in to provide any other necessary information.
>>>
>>> Thanks,
>>> Nikhil
>>>
>>> -----Original Message-----
>>> From: Andreas.Haas at Sun.COM [mailto:Andreas.Haas at Sun.COM] Sent: 
>>> Monday, April 07, 2008 12:51 PM
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] Memory Leak in schedd 6.1u4?
>>>
>>> Hi Brian,
>>>
>>> from your mentioning of
>>>
>>>     http://linux-mm.org/OOM_Killer
>>>
>>> I conclude you run into this under a Linux distribution. If it is
>>> RHEL4/amd64 I have a ready-built binary that comes with mallinfo(3) 
>>> logging and
>>>     
>> I'm
>>  
>>> confident this will us help to find the evildoer.
>>>
>>> Regards,
>>> Andreas
>>>
>>>
>>> On Sun, 6 Apr 2008, Brian Smith wrote:
>>>
>>>    
>>>> Just restarted schedd with valgrind... I should have output the next
>>>>       
>>> time the    
>>>> oom_killer kicks in. Are the "retail" binaries built with debugging
>>>>       
>>> symbols    
>>>> or will I need to build my own schedd?
>>>>
>>>> -Brian
>>>>
>>>> Chris Dagdigian wrote:
>>>>      
>>>>> Hi Brian,
>>>>>
>>>>> People have been reporting schedd leaks in the 6.1u series - some
>>>>>         
>>> problems    
>>>>> were found and fixed but there are indications on the user list that
>>>>>         
>>> the    
>>>>> problem still remains. Multiple people have been trying to track
>>>>>         
>> down
>>  
>>> the    
>>>>> cause -- any additional eyeballs and details will undoubtably be
>>>>>         
>>> welcome,    
>>>>> especially if you can run under valgrind or other tools that may
>>>>>         
>> help
>>  
>>> trace    
>>>>> down the offending code.
>>>>>
>>>>> -Chris
>>>>>
>>>>>
>>>>> On Apr 6, 2008, at 1:11 PM, Brian Smith wrote:
>>>>>        
>>>>>> Has anyone else noticed a memory leak with 6.1u4? oom-killer is
>>>>>>           
>>> stopping    
>>>>>> my sge_schedd because its wolfing down gobs of ram. The box I'm
>>>>>>           
>>> running on    
>>>>>> has 4GB and is used only for nis, qmaster/schedd, and managing 
>>>>>> and provisioning the cluster. I'll post more details if desired.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> -Brian
>>>>>>           
>>>>>         
>> ---------------------------------------------------------------------
>>  
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>         
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>
>>>>       
>>> http://gridengine.info/
>>>
>>> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551
>>> Kirchheim-Heimstetten
>>> Amtsgericht Muenchen: HRB 161028
>>> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland
>>>     
>> Boemer
>>  
>>> Vorsitzender des Aufsichtsrates: Martin Haering
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>     
>>
>>
>>   
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list