Fwd: [GE users] sge_qmaster memory spike

Kirk Patton kpatton at montalvosystems.com
Wed May 16 15:09:08 BST 2007


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Can anyone point me to any reference on what the values reported mean when profiling is turned on?

other          : wc =  21219.550s, utime =   3960.600s, stime =    776.310s, utilization =  22%
communication  : wc =      0.000s, utime =      0.000s, stime =      0.000s, utilization =   0%
packing        : wc =      0.000s, utime =      0.000s, stime =      0.000s, utilization =   0%
eventclient    : wc =      0.000s, utime =      0.000s, stime =      0.000s, utilization =   0%
eventmaster    : wc =      0.000s, utime =      0.000s, stime =      0.000s, utilization =   0%
mirror         : wc =      0.000s, utime =      0.000s, stime =      0.000s, utilization =   0%
spooling       : wc =      0.350s, utime =      0.020s, stime =      0.340s, utilization = 103%
spooling-io    : wc =    219.240s, utime =     43.740s, stime =      8.320s, utilization =  24%
spooling-script: wc =      0.000s, utime =      0.000s, stime =      0.000s, utilization =   0%
gdi            : wc =      0.000s, utime =      0.000s, stime =      0.000s, utilization =   0%
gdi_request    : wc =      0.000s, utime =      0.000s, stime =      0.000s, utilization =   0%
ht-resize      : wc =      0.000s, utime =      0.000s, stime =      0.000s, utilization =   0%
total          : wc =  21439.140s, utime =   4004.360s, stime =    784.970s, utilization =  22%

My sge_qmater stopped scheduling once again and had to be restarted.  I am trying to get some idea of where
to look for the cause.  I change my execd_spool_dir to use local disk rather than NFS, but that did 
not fix the problem.  Sge_qmaster and sge_execd on the master both continue to grow in memory use.

8275 sgeadmin 20 0 5889m 4.1g 1748 R 98 52.4 3291:30 sge_schedd
                         ^^^^
8259 sgeadmin 16 0 4893m 3.2g 7372 S 5 40.6 1542:00 sge_qmaster
                         ^^^^

Thanks
Kirk

----- "Kirk Patton" <kpatton at montalvosystems.com> wrote:
> Hello,
> 
> We are running SGE 6.0u10.  We have been noticing that sge_qmaster's
> memory consumption steadily grows for about two days and then spikes
> up quickly.  Then, after about 45 minutes, the memory gets released
> and the cycle starts over again.  
> 
> During the peaks, the system becomes sluggish and unresponsive to user
> queries.  Our execd_spool_dir has been on NFS and I have been moving
> it to local disk on each exec host in the hopes of alleviating the
> problem.  Looking at the utilization graphs we keep to track host
> performance, the issue still seems to be present.
> 
> I am wondering what steps I can take to track down what is causing the
> high memory utilization.  The SGE master has 8Gb of system ram and
> during the peak of the cycle, memory is maxed out and the system
> begins swapping.  
> 
> Profiling is enabled for the scheduler.  I am wondering if there is a
> how-to or primer for interpreting the profiler metrics.  
> 
> I have attached a graph illustrating what I am seeing.
> 
> Thanks for any suggestions.
> Kirk
> 
> -- 
> Kirk Patton x5585
> Sr. systems Administrator
> Montalvo Systems


-- 
Kirk Patton x5585
Sr. systems Administrator
Montalvo Systems

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list