[GE users] Bug causing 100% memory usage?

Stephan Grell - Sun Germany - SSG - Software Engineer stephan.grell at sun.com
Tue Mar 22 16:14:00 GMT 2005


Hello Richard,

is there a reason why you are using 5.3p1 and did not upgrade p6 or even 
SGE 6u3?

I did a quick query for the two bug descriptions and did not find an 
issue. However, I did
not see sees issue in 5.4p6 and/or SGE6 u3. One could assume, that they 
are fixed in
the never releases.

If not, a closer description of your configuration and what the jobs 
requests would be very
helpful. The qstat -F XXX for the problematic consumable is also very 
helpful.

Kind Regards,
Stephan


Richard Hobbs wrote:

>Hello,
>
>We are using SGE 5.3p1 at the moment, and while the GUI isn't great, it is
>at working for us, apart from a couple of bugs...
>
>The first bug is not *that* serious - we have set up a consumable resource
>on each execution host whose initial value equals the number of CPU's within
>the machine. Each host may have 10 queues or more, but if it only has two
>CPUs, it will only ever be able to run 2 jobs or less. However, sometimes,
>when lots of jobs are submitted at once, the machine sometimes ends up
>running more than 2 jobs, and the amount of the consumable resource
>remaining hits -1, or even -2 and -3 sometimes. Is there a way we can stop
>this happening? Is it a known bug for which there is a fix in a later
>version?
>
>The second bug is more serious. We have around 38 machines, with a total of
>just over 100 consumable resources (2 or 4 queues in use simultaneously per
>machine). However, one of our users accidentally submitted 1021 jobs
>yesterday, and the result of this was that the qmaster process started
>eating memory until the entire RAM and swap was full, and then the machine
>simply hung and had to be hard-rebooted. The machine has 1GB RAM and 2GB
>swap.
>
>Is this also a bug, or was the qmaster legitimately asking for more than 3GB
>of RAM in order to complete it's scheduling operation? If it is a bug, is
>there a fix or a patch?
>
>Also, if there is any information or log files I can send you to help you
>diagnose this issue, please let me know.
>
>Thanks in advance,
>Richard Hobbs.
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list