[GE users] Bug causing 100% memory usage?

Richard Hobbs richard.hobbs at crl.toshiba.co.uk
Tue Mar 22 16:01:45 GMT 2005


Hello,

We are using SGE 5.3p1 at the moment, and while the GUI isn't great, it is
at working for us, apart from a couple of bugs...

The first bug is not *that* serious - we have set up a consumable resource
on each execution host whose initial value equals the number of CPU's within
the machine. Each host may have 10 queues or more, but if it only has two
CPUs, it will only ever be able to run 2 jobs or less. However, sometimes,
when lots of jobs are submitted at once, the machine sometimes ends up
running more than 2 jobs, and the amount of the consumable resource
remaining hits -1, or even -2 and -3 sometimes. Is there a way we can stop
this happening? Is it a known bug for which there is a fix in a later
version?

The second bug is more serious. We have around 38 machines, with a total of
just over 100 consumable resources (2 or 4 queues in use simultaneously per
machine). However, one of our users accidentally submitted 1021 jobs
yesterday, and the result of this was that the qmaster process started
eating memory until the entire RAM and swap was full, and then the machine
simply hung and had to be hard-rebooted. The machine has 1GB RAM and 2GB
swap.

Is this also a bug, or was the qmaster legitimately asking for more than 3GB
of RAM in order to complete it's scheduling operation? If it is a bug, is
there a fix or a patch?

Also, if there is any information or log files I can send you to help you
diagnose this issue, please let me know.

Thanks in advance,
Richard Hobbs.

-- 
Richard Hobbs (Systems Administrator)
Toshiba Research Europe Ltd. - Speech Technology Group
Web: http://www.toshiba-europe.com/research/
Email: richard.hobbs at crl.toshiba.co.uk
Tel: +44 1223 376964        Mobile: +44 7811 803377



_____________________________________________________________________
This e-mail has been scanned for viruses by MCI's Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.mci.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list