[GE users] max array job tasks - and load
reuti at Staff.Uni-Marburg.DE
Fri Oct 31 12:21:42 GMT 2008
Am 31.10.2008 um 01:29 schrieb Joseph Hargitai:
> Our nodes have 16 cores. We will be using 20 of them for job arrays
> and serial jobs using a distinct queue and 60 nodes in another
> queue for parallel jobs.
> What is the best way to control load threshold on the serial/job
> array part? Usually on other clusters we like processes to equal
> core count - such we set load limit in pbs to 7.8 or 8 for 8 core
> nodes. For this cluster we would set it to 15.5, 16.
> It appears you can set/control load in two ways, possibly many more:
> a, via load/suspend thresholds
> I am not clear on the nomenclature in SGE regarding the np_load.avg
> usage. What is a np_load_av = 1.75 mean on a quadsocket quadcore
> node (16 cores)? Will SGE schedule jobs up to load 16x1.75? In
> other words, I would like to set a parameter to stop scheduling
> more jobs to a node when the node reaches load 16.
yes, the limit would be 16x1.75. Then the queue will be put into
"alarm" state and disabled, until the load is lower again.
But usually this feature is most useful on big SMP machines with 64
and mode cores, where not all parallel programs are really running in
parallel all the time. Certain parallel applications have serial
steps and would only block the cores, although there is nothing
running on them. So the idea is to oversubscribe the nodes, until
they reach a load a little bit higher than the number of installed
cores. IMO 1.75 is to high and I would suggest 1.25 to 1.5 or alike,
but it depends of course on your environment and intention.
With 16 cores per machine it might start to make sense to use this
The usual aproach for a setup is to have slots = cores. Especially,
as in Linux nowadays jobs in state "D" also count as running, hence
giving no reliable output for the number of processes running in the
system right now. So you wouldn't need this features of load/suspend
thresholds at all. I set load_thresholds always to NONE in my clusters.
To cope with the case of the sleeping parallel tasks, we have an
additonal queue called "background", with a slots count of half of
the cores in the machines and a setup nice value of 19 in the queue
definition. Means these background jobs will only run, when the
parallel job is currently in a serial step and they slave processes
If really want to bind parallel jobs to certain nodes and serial ones
to others, you can still live with one queue, by setting:
qtype BATCH INTERACTIVE,[@hostgroup1=NONE]
pe_list NONE,[@hostgroup1=make mpich openmp]
Another approach might be to limit the serial and parallel jobs by
using an RQS, but let the slots be collected from the complete
cluster. Binding some nodes to parallel jobs seems a little bit
If you want to have as many processes as possible on the same node,
then the intended setup with two queue would be good, when you also
set the queues up in a way, that the serial jobs will fill the
cluster from the one side, while the parallel jobs will fill it from
the other side.
While both queues can still contain the all cluster nodes.
> b, configure max array job tasks -
> can you set this per host or only globally? It would be very useful
> If you could set this per host in addition to a global total.
Do you mean max_aj_tasks? It's only global (see `man sge_conf`). But
I also don't see the purpose, for having it per node - it's the
maximum number of tasks you can request in qsub for an array task.
Each instance of an array job is just like a normal job and you could
use a queue setting like slots or an RQS setup to limit the execution
of its tasks, which are running at once.
What setup do you need in detail?
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users