[GE users] Calculation of load average accurately

Reuti reuti at staff.uni-marburg.de
Tue Aug 10 23:30:48 BST 2004

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


to my opinion, the load_threshold is most useful on a SMP machine with e.g. 64 
CPUs and you know, that not all parallel-programs are running in parallel all 
the time. Then you could create one queue with about 72 slots, and set the 
load_threshold to 64.

When you have dual machines, the setup with one queue and two slots is just 
okay, and you could delete the entry for load_threshold from the queue 
definition. If you want to have more than one queue (for reasons of 
organization of the setup) and limit the total number of jobs on one machine to 
the number of CPUs (i.e. 2), you could create a complex cpu_slots and set it 
for all nodes to two.

#name            shortcut   type   value           relop requestable consumable 
cpu_slots        cu         INT    0               <=    NO         YES        
##--- # starts a comment but comments are not saved across edits 

For each node:

complex_values             cpu_slots=2

This way, there will be always a limit of two jobs on each machine. I hope, 
this is what you want to achieve.

Cheers - Reuti

>What should be the correct way to define the load average in the sun grid
>engine 5.3ee. Currently on my cluster that consists of 64 node all with dual
>Pentium 4 3.2 GHz processors we are using np_load_average as the method for
>load formula and the threshold that is set as of now is 1.75.
>what should be the load formula (np_load_average) what should be the
>adjustment ?? 0.50 load threshold np_load_Average 1.75 and new jobs are not
>submitted to the queue if the np_load_Average is > 1.75 on any of the node.
>where as if i log on my compute nodes i see that the nodes are very free and
>the cpu's are mostly idle since the jobs only starts and use 10-20% of each
>CPU. And when i locally execute programs to creat artificial load the load
>average goes to 5 and even 7 and that is when i see my node a little busy. 

BTW: Load adjustment is to create artifical load, so that the load average is 
immediately after starting of a job higher, to avoid that another job is 
scheduled to the machine. It will decay over time (until the load average 
reflects the usage of the machine), which you setup in the scheduler. This 
could also be removed with the above setup:

job_load_adjustments       NONE
load_adjustment_decay_time 0:0:00

>Another thing that i noticed after which i saw the under utilization of my
>cluster is that once i do a channel bonding (that is teaming up two NIC
>cards to act as one) the load average on my linux boxes jumped to 1.0 1.0
>1.0 as minimum when there is no processes running and i see the cpu's as
>100% free. But this affected the number of jobs that were being submitted to
>the node because sun grid engine thought that the node is already loaded. 
>So my question is is there any other way to evaluate the load on a node or
>how should i go about setting a right threshold for a dual Pentium IV (3.2
>GHz) what is set to 1.75 right now.

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list