[GE users] fair sharing with preemption

Reuti reuti at staff.uni-marburg.de
Fri Nov 30 18:53:22 GMT 2007


Am 30.11.2007 um 19:21 schrieb Matt DeVuyst:

> Suppose I have two users, Alice and Bob.
> Alice submits a bunch of long-running jobs and uses 100% of the  
> processors.
> Now Bob comes along and submits jobs.
> I want half of Alice's running jobs to get kicked off (suspended,
> restarted later, whatever) and 50% of the processors to go to Bob's
> jobs, assuming Bob has submitted enough jobs to saturate half the
> cluster (if he only submits a couple of jobs, then only a couple of
> Alice's jobs should get preempted).
> This behavior should scale to the number of users--so, for example, if
> Charles submits jobs, some of Alice and Bob's jobs should get
> preempted and Charles should get 33% of the processors.
> This seems like a very straightforward and reasonable policy that
> should be easy to implement, but the solution eludes me.

the question would be: which of Alice's and Bob's jobs to suspend.  
The oldest? The newest? With checkpointing or losing intermediate  
results (i.e. wasted CPU time)?

> I've gotten fair sharing to work without preemption by setting the
> 'enforce_user' and 'auto_user_fshare' attributes.
> And I know that subordinate queues provide preemption, but as I've
> experimented with them I can't get them to preempt in the way that I'm
> talking about--either more jobs are suspended than necessary (leaving
> idle slots) or nodes are oversubscribed (with more running jobs than
> processors).
> Does anyone know how to correctly configure this policy?

You will need a co-scheduler, which has a policy its own and suspend  
(to checkpoint) or kill some jobs. It's right now not a feature in  
SGE. Once a job was started, it will run to its end.

-- Reuti

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list