[GE users] Overallocated cluster

craffi dag at sonsorol.org
Wed May 13 09:57:16 BST 2009

TWo main ways:

  - reduce the number of job slots availible on each node, thus  
reducing how many jobs can run at once

  - reduce downward the load alarm threshold value. That will close  
off nodes when they get "too busy" and by adjusting the alarm  
threshold you get to define exactly what "too busy" means

Not 100% sure ganglia data is the best measure of system load though.  
May be worthwhile using other tools to confirm the overload.


On May 12, 2009, at 2:15 PM, rmc7777 wrote:

> Hi,
> We have a 32-node (64-CPU) Apple G5 cluster running SGE.  We use  
> Ganglia to monitor the load on the cluster.  Ganglia shows that the  
> cluster is chronically overallocated, that is, running with an  
> average load much greater than 100%.  I would like to manage the  
> load with SGE such that jobs remain in a pending state until the  
> average load drops below 100%.  When the average load drops below  
> 100% jobs could be submitted to the run queue until the load goes  
> over 100% again.  Can you do this with SGE?  How would you configure  
> the queues or queue resources to accomplish this? thx.


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list