[GE users] Dynamic queue -- sge schedule policy
jtseng at montalvosystems.com
Fri Aug 25 17:02:37 BST 2006
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
We have both cases where we want to suspend and not to suspend.
The suspend case is well documented.
For the non suspend case, we have no subordinate_list. We make sure that 4G jobs go to 4G machines first.
We do this by marking the 4G machines with a boolean complex "machine_4g".
You can do this through the host_load_sensor (a bit of scripting involved here).
If you'd rather not write a host load_sensor, You then can mark the @host_4G machines in the 4G queue with machine_4g=1. (In Qmon, you can add a queue-specific hostgroup and attache the boolean attribute)
When you submit a 4G job, it needs to be submitted with a soft request of mem4g=1
The scheduler will dispatch 4G jobs to 4G machines first, and then 'roll over' to 16G machines
In this context, a qsub would like this.
qsub -q mem4G.q -soft -l mem4g=1 -hard myscript
ps on a different tangent: we don't have separate queues and we keep track that each job gets enough memory. So our submissions look like
qsub -soft -l machine_4g=1 -hard -l mem=4G,mem_free=4G myscript
mem_free=4G is a load value, and guarantees that at time of dispatch, that much memory is free. There could be non-sge jobs running that 'take' away memory and we don't want to swap.
mem=4G is a consumable complex and is used for accounting purposes. This guarantees that only 1 4G job will land on the machine (or 4 1G jobs). If the machine had 8G, then 2 4G jobs could land on it. You can set this in the queue via the hostgroup (similar to machine_4g). If you have many queues and/or many machines like we do, then we would put this in the host load_sensor and have it do a qconf -mattr exec_host complex_values mem=4G)
On Fri, Aug 25, 2006 at 02:39:14PM +0300, Juha Jäykkä wrote:
> > In the 16G queue, the queue definition should contain:
> > subordinate_list my_4G_queue
> We have the same situation, but we are *not* comfortable suspending jobs.
> Is this necessary? I realise that not suspending them will mean the big
> jobs have to wait before the bigmem nodes are free, but we're willing to
> live with that.
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users