[GE users] sge_qmaster memory spike

Kirk Patton kpatton at montalvosystems.com
Mon May 14 15:24:53 BST 2007


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hello,

We are running SGE 6.0u10.  We have been noticing that sge_qmaster's memory consumption steadily grows for about two days and then spikes up quickly.  Then, after about 45 minutes, the memory gets released and the cycle starts over again.  

During the peaks, the system becomes sluggish and unresponsive to user queries.  Our execd_spool_dir has been on NFS and I have been moving it to local disk on each exec host in the hopes of alleviating the problem.  Looking at the utilization graphs we keep to track host performance, the issue still seems to be present.

I am wondering what steps I can take to track down what is causing the high memory utilization.  The SGE master has 8Gb of system ram and during the peak of the cycle, memory is maxed out and the system begins swapping.  

Profiling is enabled for the scheduler.  I am wondering if there is a how-to or primer for interpreting the profiler metrics.  

I have attached a graph illustrating what I am seeing.

Thanks for any suggestions.
Kirk

-- 
Kirk Patton x5585
Sr. systems Administrator
Montalvo Systems


    [ Part 2, Image/PNG (Name: "graph.png") 15 KB. ]
    [ Unable to print this part. ]


    [ Part 3: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list