[GE users] automatic suspension on full cluster

reuti reuti at staff.uni-marburg.de
Wed Feb 10 15:28:52 GMT 2010


Hi,

Am 10.02.2010 um 15:00 schrieb massot:

> I'm trying to deal with the issue of full cluster (no available  
> slot). I
> have a cluster to which all users have equal access. The problem is  
> that
> when there's no remaining slot, some users have to wait until the  
> end of
> jobs of other users, who sometimes run a lot of jobs. It's unfair. On
> the other hand using individual job quotas is often a waste of  
> resources
> since you can have a cluster that is not full.

the idea behind SGE is, that once a job is in running, it will not be  
pushed to a waiting state again but run to completion. Your request  
is similar to this I think:

http://gridengine.sunsource.net/ds/viewMessage.do? 
dsMessageId=47284&dsForumId=38


> Here is the ideal configuration for my cluster. Anyone can submit as
> many jobs as he wants if the cluster is not full. If the cluster is  
> full
> and someone wants to submit a job, instead of having this job pending,
> the person who runs the biggest number of jobs gets one of his jobs
> suspended, and a slot is freed so the job of the first person can run.
> As soon as slots are available again, jobs suspended because of full
> cluster are resumed.
> I could build a system based on cron jobs suspending and resuming  
> jobs,
> and adjusting the "slots" queue attributes on the fly, but that sounds
> like a quite ugly solution.
> Can you think of an elegant way to configure my ideal cluster?

Hence for now a co-scheduler would do. Another discussion about this  
topic led to:

http://gridengine.sunsource.net/ds/viewMessage.do? 
dsForumId=38&dsMessageId=209873

I also had the idea to introduce something like a "suspendable yes/ 
no" flag, hence a user could submit up to a maximum non-suspendable  
jobs and so decide which of his jobs he judges on his own as less  
important:

http://gridengine.sunsource.net/issues/show_bug.cgi?id=3162

==

If the resources (which are still occupied by a suspended job) are  
not a problem, you could use the new slot-wise suspend in SGE 6.2u5  
and create a "main" and a "background/secondary" queue. Jobs  
susbmitted to the main queue would allow a maximum of 20 or whatever  
per user. But he can still submit to the background queue to fill the  
cluster, and when a new job in main queue arrives, one job in the  
background queue will be suspended. But it's not necessarily from the  
user with the most jobs in the cluster - just the one with the  
shortest or longest runtime.

-- Reuti


> -- 
> Bernard Massot
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=244250
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=244259

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list