[GE users] max array job tasks - and load

Reuti reuti at Staff.Uni-Marburg.DE
Fri Oct 31 12:21:42 GMT 2008


Am 31.10.2008 um 01:29 schrieb Joseph Hargitai:

> Our nodes have 16 cores. We will be using 20 of them for job arrays  
> and serial jobs using a distinct queue and 60 nodes in another  
> queue for parallel jobs.
> What is the best way to control load threshold on the serial/job  
> array part? Usually on other clusters we like processes to equal  
> core count - such we set load limit in pbs to 7.8 or 8 for 8 core  
> nodes. For this cluster we would set it to 15.5, 16.
> It appears you can set/control load in two ways, possibly many more:
> a, via load/suspend thresholds
> I am not clear on the nomenclature in SGE regarding the np_load.avg  
> usage. What is a np_load_av = 1.75 mean on a quadsocket quadcore  
> node (16 cores)? Will SGE schedule jobs up to load 16x1.75? In  
> other words, I would like to set a parameter to stop scheduling  
> more jobs to a node when the node reaches load 16.

yes, the limit would be 16x1.75. Then the queue will be put into  
"alarm" state and disabled, until the load is lower again.

But usually this feature is most useful on big SMP machines with 64  
and mode cores, where not all parallel programs are really running in  
parallel all the time. Certain parallel applications have serial  
steps and would only block the cores, although there is nothing  
running on them. So the idea is to oversubscribe the nodes, until  
they reach a load a little bit higher than the number of installed  
cores. IMO 1.75 is to high and I would suggest 1.25 to 1.5 or alike,  
but it depends of course on your environment and intention.

With 16 cores per machine it might start to make sense to use this  

The usual aproach for a setup is to have slots = cores. Especially,  
as in Linux nowadays jobs in state "D" also count as running, hence  
giving no reliable output for the number of processes running in the  
system right now. So you wouldn't need this features of load/suspend  
thresholds at all. I set load_thresholds always to NONE in my clusters.

To cope with the case of the sleeping parallel tasks, we have an  
additonal queue called "background", with a slots count of half of  
the cores in the machines and a setup nice value of 19 in the queue  
definition. Means these background jobs will only run, when the  
parallel job is currently in a serial step and they slave processes  
don't .

If really want to bind parallel jobs to certain nodes and serial ones  
to others, you can still live with one queue, by setting:

qtype                 BATCH INTERACTIVE,[@hostgroup1=NONE]
pe_list               NONE,[@hostgroup1=make mpich openmp]

Another approach might be to limit the serial and parallel jobs by  
using an RQS, but let the slots be collected from the complete  
cluster. Binding some nodes to parallel jobs seems a little bit  


If you want to have as many processes as possible on the same node,  
then the intended setup with two queue would be good, when you also  
set the queues up in a way, that the serial jobs will fill the  
cluster from the one side, while the parallel jobs will fill it from  
the other side.

While both queues can still contain the all cluster nodes.

> b, configure max array job tasks -
> can you set this per host or only globally? It would be very useful  
> If you could set this per host in addition to a global total.

Do you mean max_aj_tasks? It's only global (see `man sge_conf`). But  
I also don't see the purpose, for having it per node - it's the  
maximum number of tasks you can request in qsub for an array task.  
Each instance of an array job is just like a normal job and you could  
use a queue setting like slots or an RQS setup to limit the execution  
of its tasks, which are running at once.

What setup do you need in detail?

-- Reuti

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list