[GE users] Newbie: @group and fractional usage

Reuti reuti at staff.uni-marburg.de
Wed May 17 12:07:42 BST 2006


Hi David,

Am 17.05.2006 um 07:03 schrieb David Kulp:

> I have two questions as a new grid engine user.
>
> First, I'm running on linux and attempts to create a userlist with  
> the @unixgroup notation doesn't seem to work.  qmon accepts it, but  
> subsequent commands don't recognize it.  For example, I added  
> "@group" to the deadlineusers userset, but when I try to submit  
> deadline jobs I get an error 'job rejected: the user "dkulp" is no  
> deadline initiation user'.  Deadline job submission only works when  
> I add my explicit username in the deadlineusers userset.  But I  
> don't want to do that for every new user.
>
>
> Second, I would like to implement a usage policy that removes  
> (reschedules/migrates) a user's jobs from running queues if the  
> user is

by default this policy isn't implemented in SGE. Although if there  
would be such a policy, it would be hard to decide which of the  
running jobs to kill from any user.

> currently exceeding his fractional share and there is a demand for  
> resources.  I've set up a share tree, which works well when all  
> running jobs are short.  However, we want a policy that preempts  
> running programs according to that share tree policy.
>
> I would think that our scenario is common, but I haven't found  
> anything on this.  Our compute cluster is fractionally owned by  
> multiple groups; that is, different groups have contributed nodes.   
> Usage is bursty, but jobs some times can run for days.  Suppose  
> Alice and Bob each own 50% of the cluster.  Initially the cluster  
> is idle, so when Alice submits her jobs they fill up all the queues  
> for 100% utilization.  Then Bob wants to run his jobs.  If Alice's  
> jobs are short, then the share tree policy would quickly balance  
> out the resource usage to 50-50.  But if Alice's jobs run for days,  
> then Bob is stuck waiting.  Alice and Bob would prefer if Alice's  
> job was just terminated (or checkpointed) and rescheduled.

What you can try is to use a script running periodically, which  
parses the qstat output and suspend some jobs if it discovers that  
some other user's jobs should run. Having checkpointing defined, the  
suspend will checkpoint and reschedule the job or you could  
reschedule them directly on your own.

HTH - Reuti


>
> The only solution that I can think of is to create two queues for  
> every host, one queue for Alice and one for Bob.  On 50% of the  
> hosts the Alice queue will be subordinate to Bob.  Vice versa on  
> the other half.  But this requires a lot of manual queue  
> configuration as the number of cluster owners increases.  It would  
> be nice if there were some more general scheme like the share  
> tree.  In other words, I would like the share tree to effect  
> preemption policy.  Any ideas?
>
> Thanks in advance.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list