[GE users] node load and selection

templedf daniel.templeton at oracle.com
Fri Apr 2 15:14:38 BST 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

If you want to cram your jobs in like sardines, set the 
job_load_adjustments to NONE.  In general, you want job_load_adjustments 
* slots to be less than the load_threshold.  You can tweak that one at 
either end.  You should set the load_adjustment_decay time according to 
the average execution profile of your workload: how long does it take 
the average app to reach full CPU consumption.

Daniel

On 04/01/10 23:35, igardais wrote:
> Hi Daniel,
>
> That's correct.
> Should I lower job_load_adjustments, load_adjustment_decay_time or both
> of them ?
>
> Thanks,
> Ionel
>
>
> ------------------------------------------------------------------------
> *De :* templedf <daniel.templeton at oracle.com>
> *? :* users at gridengine.sunsource.net
> *Envoyé le :* Jeu 1 avril 2010, 15 h 44 min 21 s
> *Objet :* Re: [GE users] node load and selection
>
> I think the explanation he's looking for is job_load_adjustments. By
> default, Grid Engine is set with job_load_adjustments as
> np_load_avg=0.5. That means that for every job placed on a node, Grid
> Engine adds 0.5 to that host's np_load_avg virtually. That load decays
> over time (7.5 minutes by default). The idea is that jobs tend to ramp
> up their resource usage (CPU in this case) over time. Instead of making
> scheduling decisions based on the jobs' initial resource usage, we add
> the virtual load to give them a little buffer space. The net is that
> Grid Engine doesn't pack jobs in as tightly as you might at first expect.
>
> Daniel
>
> On 04/01/10 02:19, reuti wrote:
>  > Hi,
>  >
>  > Am 01.04.2010 um 08:26 schrieb igardais:
>  >
>  >> Some of my users do not understand how SGE selects (or not) nodes
> for execution.
>  >> In my setup, the PE they use is set to $fill_up and the queue is set
> to 'seq_no 0' (all nodes are equals) and 'load_thresholds
> np_load_avg=0.95' (the node have to be fully loaded to be declared in
> 'alarm' state).
>  >>
>  >> According to their saying, jobs that could have been run on 2 nodes
> are split over 3 or 4 nodes (not always, but sometimes).
>  >>
>  >> What can I do to avoid this ?
>  >> Lower the load_thresholds ?
>  >> Which explanation can I gave them ? I'd like not to go too deep in
> SGE-related selection algorithm ...
>  >
>  > do you request any other resource; any load on the other machines?
> You can force $fill_up by setting up this:
>  >
>  > http://blogs.sun.com/sgrell/entry/grid_engine_scheduler_hacks_least
>  >
>  > -- Reuti
>  >
>  >
>  >> Thanks,
>  >> Ionel
>  >>
>  >>
>  >
>  > ------------------------------------------------------
>  >
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=252027
> <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=252027>
>  >
>  > To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net
> <mailto:users-unsubscribe at gridengine.sunsource.net>].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=252043
> <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=252043>
>
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net
> <mailto:users-unsubscribe at gridengine.sunsource.net>].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=252129

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list