[GE users] preemption and consumables

Reuti reuti at staff.uni-marburg.de
Fri Oct 10 10:56:02 BST 2008


Am 10.10.2008 um 03:39 schrieb Serge Nosov:

> My cluster consists of 4-core machines. There are two queues in the  
> cluster: "long" and "short". "long" is subordinate to "short".
> To fully utilize all the cores, I set up 4 slots for each queue on  
> each system. To avoid oversubscribing of the resources, I set up a  
> consumable "memory".

often the built-in complexes "h_vmem" and "virtual_free" are used for  
this and made consumable, but you can get the same with your custom  
complex. Only difference would be in case of "h_vmem", as jobs would  
be killed when they exceed the requested amount.

> Each host has 16g of "memory" and each job by default uses 4g. This  
> way no more than 4 jobs using 4g or "memory" are allowed to run on  
> any node. The problem arises, however, when a job is submitted to a  
> "short" queue to a node that runs 4 "long" queue slots. I want the  
> short job to preempt one of the long jobs. This does not happen  
> because there is no more 'memory" consumable available. So the  
> short job ends up waiting.
> If I remove the consumable, preemption starts working. But in this  
> case, jobs might allocate more memory than there is RAM, causing  
> swapping.

Correct, but they would only swap one time to get rid of the stopped,  
i.e. suspended, jobs in memory.

> What is the appropriate way to use consumable resources and   
> preemption?
> Also, when preemption does occur, all instances of the "long" queue  
> are suspended, even though there might only bee one instance on the  
> "short" queue. Is it possiblle to configure SGE in such a way that  
> it would suspend only as many instances of "long" queue as there  
> are instances of the "short" queue, e.g., if 4 long jobs are  
> running and 1 short jobs gets sheduled, then only 1 long job is  
> suspended?

Not directly. This is for now the intended behavior. What you can  
try, is to define a "suspend_thresholds" in the long queue, instead  
of defining the subordination in the short queue. When there are too  
many jobs running and so overloading the node for a short time, the  
long queue will suspend one of its jobs after the other. For this,  
the np_load_avg=1 could be defined.

-- Reuti

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list