[GE users] scheduling questions

Chris Dagdigian dag at sonsorol.org
Thu Jan 12 16:14:25 GMT 2006


On Jan 12, 2006, at 10:44 AM, Richard Smith wrote:

> Hi
>
> 1. Will SGE chose faster nodes over slower nodes automatically, or  
> do I have to configure this behaviour? (E.g. put faster nodes into  
> a different queue?)  So far it appears to be doing this, but I  
> wonder if it is just a side effect of the fast nodes appearing at  
> the top of the hostgroup.
>

By default, Grid Engine will first do a sort to find the best  
available queue instances that match the user/job requirements.

After it does this sort, its default behavior is to do a further sort  
in order to choose the "least busy" of the available systems. By  
default SGE is not going to sort by CPU speed. You can influence or  
customize this behavior by choosing to sort on a custom sequence  
number (where you list the faster nodes first).  There are other ways  
you can let SGE know about "relative power" between systems but I  
can't recall them (sorry!).

Stephan has a blog entry on a similar sort of scheduler sort tweak  
that will at least show you the parameters you will be setting. Once  
you know the params you can hit the documentation to find out how to  
enable the behavior you wish. The blog entry is here: http:// 
blogs.sun.com/roller/page/sgrell?entry=n1ge_6_scheduler_hacks_sorting


> 2. When the system load is over a certain value, I understand SGE  
> will not schedule jobs on a node.  How do I change this value (or  
> disable this check completely)?  I don't want CPUs sitting idle  
> because the system load happens to have gone slightly too high a  
> few minutes previously.
>

There are several alarm thresholds, the one you are probably talking  
about here is a "load alarm threshold" which is when SGE will stop  
scheduling to a node that it feels is "too busy" (based on reported  
load average normalized for CPU count) even if it has free job slots.

I have found that the build in default values are perfectly fine,  
rarely get invoked and the times when I have found systems to be in  
load_alarm state it has been because the systems were truly thrashing  
around and I was happy that SGE temporarily halted scheduling further  
work on them.

I would encourage you to leave the default settings alone to see if  
they will actually get in your way. If you do need to adust or  
disable the value, the parameter is called "load_thresholds" and is  
part of the cluster queue configuration ("qconf -sq all.q" to see  
your current settings). Read the manpage for "queue_conf (5)" to see  
the details on how this is done.

-Chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list