[GE users] GridEngine v5.3p1 eating too much memory

Andy Schwierskott andy.schwierskott at sun.com
Wed Apr 27 11:08:10 BST 2005


there have been memory leaks fixed in 5.3. There have also been fixes in the
schedd-qmaster protocol in 5.3 which avoid memory overhead for certain

Please always check the list of fixes which have been done with patch
releases on the HOWTO pages


So 5.3p6 is at least your choice, but why not upgrading to 6.0? 6.0u4 will
be released next week or in the begining of the week of 05/09


> Hello,
> We are running GridEngine 5.3p1 (we never upgraded because we never had a
> problem), and we now have a problem.
> We have around 46 execution machines (totalling 130 CPUs), 8 submit hosts,
> and 1 qmaster, all running RedHat 8.0. We therefore have 130 queues in 'run'
> mode at any one time.
> When lots of jobs are submitted (300 or more), the sge_schedd process starts
> to consume memory at an alarming rate. With 331 jobs in the qstat output,
> and 130 running, sge_schedd occupied 55% of the memory according to 'top'.
> This however, did not cause a problem.
> But... When more than 300 jobs are submitted, like 500 or 1000 for example,
> this memory usage goes so high, that it uses up all the 1GB RAM, and the 2GB
> swap, and the machine either ends the process itself, or the process kills
> the entire qmaster machine, which then has to be rebooted and sometimes
> powered off.
> Has anyone seen this problem before? Is it a bug, or just a bad, inefficient
> algorithm within the scheduler's source code?
> Is there a fix available in a later patch level?
> Our workaround for the moment is for our researchers to check the grid
> before they submit their jobs, but this is not ideal because I am also
> having to monitor it non-stop. I guess a better workaround would be for the
> researcher's scripts to run a qstat and check the number of jobs before
> submitting new ones, but then they are basically writing their own
> scheduling software, when GridEngine is supposed to do it for them.
> Surely 1000 jobs and 130 queues isn't a lot, right?
> Any suggestions are very much appreciated.
> Thanks in advance,
> Richard Hobbs.

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list