[GE users] GridEngine v5.3p1 eating too much memory

Richard Hobbs richard.hobbs at crl.toshiba.co.uk
Wed Apr 27 11:45:44 BST 2005


Hello,

How easy is the upgrade process? I mean, do I just untar the latest version
over the top of my existing version? Is it just the binaries that have been
updated? Or do I need to do a re-install? Basically, what's the upgrade
process like?

These questions apply to 5.3p6 and 6.0u4.

Thanks again,
Hobbs.

-- 
Richard Hobbs (Systems Administrator)
Toshiba Research Europe Ltd. - Speech Technology Group
Web: http://www.toshiba-europe.com/research/
Email: richard.hobbs at crl.toshiba.co.uk
Tel: +44 1223 376964        Mobile: +44 7811 803377 

> -----Original Message-----
> From: Andy Schwierskott [mailto:andy.schwierskott at sun.com] 
> Sent: 27 April 2005 11:08
> To: 'GridEngine Mailing List'
> Subject: Re: [GE users] GridEngine v5.3p1 eating too much memory
> 
> Richard,
> 
> there have been memory leaks fixed in 5.3. There have also 
> been fixes in the
> schedd-qmaster protocol in 5.3 which avoid memory overhead for certain
> situations.
> 
> Please always check the list of fixes which have been done with patch
> releases on the HOWTO pages
> 
>    http://gridengine.sunsource.net/project/gridengine/60patches.txt
>    http://gridengine.sunsource.net/project/gridengine/53patches.txt
> 
> So 5.3p6 is at least your choice, but why not upgrading to 
> 6.0? 6.0u4 will
> be released next week or in the begining of the week of 05/09
> 
> Andy
> 
> > Hello,
> >
> > We are running GridEngine 5.3p1 (we never upgraded because 
> we never had a
> > problem), and we now have a problem.
> >
> > We have around 46 execution machines (totalling 130 CPUs), 
> 8 submit hosts,
> > and 1 qmaster, all running RedHat 8.0. We therefore have 
> 130 queues in 'run'
> > mode at any one time.
> >
> > When lots of jobs are submitted (300 or more), the 
> sge_schedd process starts
> > to consume memory at an alarming rate. With 331 jobs in the 
> qstat output,
> > and 130 running, sge_schedd occupied 55% of the memory 
> according to 'top'.
> > This however, did not cause a problem.
> >
> > But... When more than 300 jobs are submitted, like 500 or 
> 1000 for example,
> > this memory usage goes so high, that it uses up all the 1GB 
> RAM, and the 2GB
> > swap, and the machine either ends the process itself, or 
> the process kills
> > the entire qmaster machine, which then has to be rebooted 
> and sometimes
> > powered off.
> >
> > Has anyone seen this problem before? Is it a bug, or just a 
> bad, inefficient
> > algorithm within the scheduler's source code?
> >
> > Is there a fix available in a later patch level?
> >
> > Our workaround for the moment is for our researchers to 
> check the grid
> > before they submit their jobs, but this is not ideal 
> because I am also
> > having to monitor it non-stop. I guess a better workaround 
> would be for the
> > researcher's scripts to run a qstat and check the number of 
> jobs before
> > submitting new ones, but then they are basically writing their own
> > scheduling software, when GridEngine is supposed to do it for them.
> >
> > Surely 1000 jobs and 130 queues isn't a lot, right?
> >
> > Any suggestions are very much appreciated.
> >
> > Thanks in advance,
> > Richard Hobbs.
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> _____________________________________________________________________
> This e-mail has been scanned for viruses by MCI's Internet 
> Managed Scanning Services - powered by MessageLabs. For 
> further information visit http://www.mci.com
> 



_____________________________________________________________________
This e-mail has been scanned for viruses by MCI's Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.mci.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list