[GE users] qmaster memory leak

Eric Andresen eandres at mars.asu.edu
Sat Nov 6 00:19:34 GMT 2004


I can confirm this behavior on 6.0u1 and the latest CVS checkout.

This is unfortunate, as I liked the way that subordinate queues handled
automatic suspension of lower priority queues.

Are there any plans to rework the subordinate system to (significantly)
reduce memory consumption? As it stands, having three queues, 'high',
'normal', and 'low', with low as a subordinate of both, and normal as a
subordinate of high uses approximately 890MiB of memory when each queue
has 50 nodes with two slots apiece.

Thanks,
-- 
   Eric Andresen
   Systems Administrator
   Mars Space Flight Facility
   Arizona State University
   eandres at mars.asu.edu
   (480) 727-8471

On Tue, 2004-10-05 at 15:50, Mike Brown wrote:
> Andreas,
> 
> I've looked into the problem from another angle, and it doesn't seem to 
> be a memory leak.  I tried recreating the queue that caused the problem, 
> and noticed that when I put fewer cpus in it, the memory usage comes 
> close to exceeding that of the machine, but does not increase.
> 
> I think the problem is with subordinate queues.  As I increase the 
> number of queues subordinated, memory increases rapidly.  If I remove 
> the subordinate restriction, I am able to declare many different cluster 
> queues.  My current (5.3p5) configuration is similar to below, with 
> higher level queues subordinating lower level ones:
> 
> level3 (subordinates level2, level1)
> level2 (subordinates level1)
> level1
> 
> As I try to repeat this in 6.0u1, I have approximately 100 hosts at each 
> level, and hit 2GB memory usage while populating the 3rd level.  
> 
> Thanks for your help,
> 
> Mike
> 
> 
> Andreas Haas wrote:
> 
> >You might try starting it again and use 'strace' to trace into
> >it when starting qmaster manually from within a shell. Make
> >sure you do a
> >
> >   setenv SGE_ND
> >
> >otherwise strace is pointless since would daemonize ...
> >
> >Andreas
> >
> >
> >On Mon, 4 Oct 2004, Mike Brown wrote:
> >
> >  
> >
> >>Linux 2.4.18-27.7.x i686
> >>
> >>Rayson Ho wrote:
> >>
> >>    
> >>
> >>>On which platform??
> >>>
> >>>Rayson
> >>>
> >>>
> >>>
> >>>
> >>>      
> >>>
> >>>>I'm trying to configure SGE 6.0u1 for the first time.  I configured
> >>>>users and cluster queues and ran successful test jobs.  After this, I
> >>>>created a clone of an existing cluster queue and suddenly qmaster was
> >>>>unresponsive.  I left it alone for several minutes, did a ps listing,
> >>>>and saw one of the qmaster processes in the Z state.
> >>>>
> >>>>
> >>>>        
> >>>>
> >>>---------------------------------------------------------
> >>>Get your FREE E-mail account at http://www.eseenet.com !
> >>>
> >>>---------------------------------------------------------------------
> >>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>
> >>>
> >>>      
> >>>
> >>    
> >>
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >For additional commands, e-mail: users-help at gridengine.sunsource.net
> >  
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list