[GE users] scheduling questions
dag at sonsorol.org
Thu Jan 12 16:14:25 GMT 2006
On Jan 12, 2006, at 10:44 AM, Richard Smith wrote:
> 1. Will SGE chose faster nodes over slower nodes automatically, or
> do I have to configure this behaviour? (E.g. put faster nodes into
> a different queue?) So far it appears to be doing this, but I
> wonder if it is just a side effect of the fast nodes appearing at
> the top of the hostgroup.
By default, Grid Engine will first do a sort to find the best
available queue instances that match the user/job requirements.
After it does this sort, its default behavior is to do a further sort
in order to choose the "least busy" of the available systems. By
default SGE is not going to sort by CPU speed. You can influence or
customize this behavior by choosing to sort on a custom sequence
number (where you list the faster nodes first). There are other ways
you can let SGE know about "relative power" between systems but I
can't recall them (sorry!).
Stephan has a blog entry on a similar sort of scheduler sort tweak
that will at least show you the parameters you will be setting. Once
you know the params you can hit the documentation to find out how to
enable the behavior you wish. The blog entry is here: http://
> 2. When the system load is over a certain value, I understand SGE
> will not schedule jobs on a node. How do I change this value (or
> disable this check completely)? I don't want CPUs sitting idle
> because the system load happens to have gone slightly too high a
> few minutes previously.
There are several alarm thresholds, the one you are probably talking
about here is a "load alarm threshold" which is when SGE will stop
scheduling to a node that it feels is "too busy" (based on reported
load average normalized for CPU count) even if it has free job slots.
I have found that the build in default values are perfectly fine,
rarely get invoked and the times when I have found systems to be in
load_alarm state it has been because the systems were truly thrashing
around and I was happy that SGE temporarily halted scheduling further
work on them.
I would encourage you to leave the default settings alone to see if
they will actually get in your way. If you do need to adust or
disable the value, the parameter is called "load_thresholds" and is
part of the cluster queue configuration ("qconf -sq all.q" to see
your current settings). Read the manpage for "queue_conf (5)" to see
the details on how this is done.
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users