[GE users] Dynamic queue -- sge schedule policy

Juha Jäykkä juhaj at iki.fi
Mon Aug 28 08:35:04 BST 2006


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

> We do this by marking the 4G machines with a boolean complex
> "machine_4g". You can do this through the host_load_sensor (a bit of
> scripting involved here). If you'd rather not write a host
> load_sensor,   You then can mark the @host_4G machines in the 4G queue
> with machine_4g=1. (In Qmon, you can add a queue-specific hostgroup and
> attache the boolean attribute)
> 
> When you submit a 4G job, it needs to be submitted with a soft request
> of mem4g=1 The scheduler will dispatch 4G jobs to 4G machines first,
> and then 'roll over' to 16G machines
> 
> In this context, a qsub would like this.
> qsub -q mem4G.q -soft -l mem4g=1 -hard  myscript

This seems rather complicated - especially for the end user who never
remembers the proper qsub parameters anyway. =)

Is there not an easier way? Reuti suggested a simpler solution, but I'm
afraid of what happens if we go from sorting by load to sorting by
seq_no.

> ps on a different tangent: we don't have separate queues and we keep
> track that each job gets enough memory.  So our submissions look like
> 
> qsub -soft -l machine_4g=1 -hard -l mem=4G,mem_free=4G myscript
> 
> mem_free=4G is a load value, and guarantees that at time of dispatch,
> that much memory is free.  There could be non-sge jobs running that
> 'take' away memory and we don't want to swap.
> 
> mem=4G is a consumable complex and is used for accounting purposes.
> This guarantees that only 1 4G job will land on the machine (or 4 1G
> jobs).  If the machine had 8G, then 2 4G jobs could land on it.  You
> can set this in the queue via the hostgroup (similar to machine_4g).
> If you have many queues and/or many machines like we do, then we would
> put this in the host load_sensor and have it do a qconf -mattr
> exec_host complex_values mem=4G)

This sounds good as well, except that we're not (currently?) interested
in accounting, we just want to make sure three things happen

1) no job takes more memory from a node than is appropriate (that is, 1 G
per CPU core on most machines and 4 G per core on 16G nodes)

2) 16G machines stay free of small jobs as long as there are free cpus in
small mem nodes

3) The machines are filled first according to 2) and then according to
lowest load.

Point 1) above is enforced by h_vmem or such - the only problem being
that some programs put their data in stack and thus effectively
circumvent vmem limit. =( It is not a big problem, though. Numbers 2) and
3) are what I'm trying to configure at the moment.

-Juha

-- 
                 -----------------------------------------------
                | Juha Jäykkä, juolja at utu.fi			|
		| Laboratory of Theoretical Physics		|
		| Department of Physics, University of Turku	|
                | home: http://www.utu.fi/~juolja/              |
                 -----------------------------------------------


    [ Part 2, Application/PGP-SIGNATURE (Name: "signature.asc") 196 bytes. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list