[GE users] node load and selection

templedf daniel.templeton at oracle.com
Thu Apr 1 14:44:21 BST 2010

I think the explanation he's looking for is job_load_adjustments.  By 
default, Grid Engine is set with job_load_adjustments as 
np_load_avg=0.5.  That means that for every job placed on a node, Grid 
Engine adds 0.5 to that host's np_load_avg virtually.  That load decays 
over time (7.5 minutes by default).  The idea is that jobs tend to ramp 
up their resource usage (CPU in this case) over time.  Instead of making 
scheduling decisions based on the jobs' initial resource usage, we add 
the virtual load to give them a little buffer space.  The net is that 
Grid Engine doesn't pack jobs in as tightly as you might at first expect.


On 04/01/10 02:19, reuti wrote:
> Hi,
> Am 01.04.2010 um 08:26 schrieb igardais:
>> Some of my users do not understand how SGE selects (or not) nodes for execution.
>> In my setup, the PE they use is set to $fill_up and the queue is set to 'seq_no 0' (all nodes are equals) and 'load_thresholds np_load_avg=0.95' (the node have to be fully loaded to be declared in 'alarm' state).
>> According to their saying, jobs that could have been run on 2 nodes are split over 3 or 4 nodes (not always, but sometimes).
>> What can I do to avoid this ?
>> Lower the load_thresholds ?
>> Which explanation can I gave them ? I'd like not to go too deep in SGE-related selection algorithm ...
> do you request any other resource; any load on the other machines? You can force $fill_up by setting up this:
> http://blogs.sun.com/sgrell/entry/grid_engine_scheduler_hacks_least
> -- Reuti
>> Thanks,
>> Ionel
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=252027
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list