[GE users] Bug causing 100% memory usage?
richard.hobbs at crl.toshiba.co.uk
Tue Mar 22 16:01:45 GMT 2005
We are using SGE 5.3p1 at the moment, and while the GUI isn't great, it is
at working for us, apart from a couple of bugs...
The first bug is not *that* serious - we have set up a consumable resource
on each execution host whose initial value equals the number of CPU's within
the machine. Each host may have 10 queues or more, but if it only has two
CPUs, it will only ever be able to run 2 jobs or less. However, sometimes,
when lots of jobs are submitted at once, the machine sometimes ends up
running more than 2 jobs, and the amount of the consumable resource
remaining hits -1, or even -2 and -3 sometimes. Is there a way we can stop
this happening? Is it a known bug for which there is a fix in a later
The second bug is more serious. We have around 38 machines, with a total of
just over 100 consumable resources (2 or 4 queues in use simultaneously per
machine). However, one of our users accidentally submitted 1021 jobs
yesterday, and the result of this was that the qmaster process started
eating memory until the entire RAM and swap was full, and then the machine
simply hung and had to be hard-rebooted. The machine has 1GB RAM and 2GB
Is this also a bug, or was the qmaster legitimately asking for more than 3GB
of RAM in order to complete it's scheduling operation? If it is a bug, is
there a fix or a patch?
Also, if there is any information or log files I can send you to help you
diagnose this issue, please let me know.
Thanks in advance,
Richard Hobbs (Systems Administrator)
Toshiba Research Europe Ltd. - Speech Technology Group
Email: richard.hobbs at crl.toshiba.co.uk
Tel: +44 1223 376964 Mobile: +44 7811 803377
This e-mail has been scanned for viruses by MCI's Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.mci.com
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users