[GE users] preemption and consumables
reuti at staff.uni-marburg.de
Fri Oct 10 10:56:02 BST 2008
Am 10.10.2008 um 03:39 schrieb Serge Nosov:
> My cluster consists of 4-core machines. There are two queues in the
> cluster: "long" and "short". "long" is subordinate to "short".
> To fully utilize all the cores, I set up 4 slots for each queue on
> each system. To avoid oversubscribing of the resources, I set up a
> consumable "memory".
often the built-in complexes "h_vmem" and "virtual_free" are used for
this and made consumable, but you can get the same with your custom
complex. Only difference would be in case of "h_vmem", as jobs would
be killed when they exceed the requested amount.
> Each host has 16g of "memory" and each job by default uses 4g. This
> way no more than 4 jobs using 4g or "memory" are allowed to run on
> any node. The problem arises, however, when a job is submitted to a
> "short" queue to a node that runs 4 "long" queue slots. I want the
> short job to preempt one of the long jobs. This does not happen
> because there is no more 'memory" consumable available. So the
> short job ends up waiting.
> If I remove the consumable, preemption starts working. But in this
> case, jobs might allocate more memory than there is RAM, causing
Correct, but they would only swap one time to get rid of the stopped,
i.e. suspended, jobs in memory.
> What is the appropriate way to use consumable resources and
> Also, when preemption does occur, all instances of the "long" queue
> are suspended, even though there might only bee one instance on the
> "short" queue. Is it possiblle to configure SGE in such a way that
> it would suspend only as many instances of "long" queue as there
> are instances of the "short" queue, e.g., if 4 long jobs are
> running and 1 short jobs gets sheduled, then only 1 long job is
Not directly. This is for now the intended behavior. What you can
try, is to define a "suspend_thresholds" in the long queue, instead
of defining the subordination in the short queue. When there are too
many jobs running and so overloading the node for a short time, the
long queue will suspend one of its jobs after the other. For this,
the np_load_avg=1 could be defined.
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users