[GE users] Newbie: @group and fractional usage

David Kulp dkulp+sge at cs.umass.edu
Wed May 17 06:03:25 BST 2006

I have two questions as a new grid engine user.

First, I'm running on linux and attempts to create a userlist with  
the @unixgroup notation doesn't seem to work.  qmon accepts it, but  
subsequent commands don't recognize it.  For example, I added  
"@group" to the deadlineusers userset, but when I try to submit  
deadline jobs I get an error 'job rejected: the user "dkulp" is no  
deadline initiation user'.  Deadline job submission only works when I  
add my explicit username in the deadlineusers userset.  But I don't  
want to do that for every new user.

Second, I would like to implement a usage policy that removes  
(reschedules/migrates) a user's jobs from running queues if the user  
is currently exceeding his fractional share and there is a demand for  
resources.  I've set up a share tree, which works well when all  
running jobs are short.  However, we want a policy that preempts  
running programs according to that share tree policy.

I would think that our scenario is common, but I haven't found  
anything on this.  Our compute cluster is fractionally owned by  
multiple groups; that is, different groups have contributed nodes.   
Usage is bursty, but jobs some times can run for days.  Suppose Alice  
and Bob each own 50% of the cluster.  Initially the cluster is idle,  
so when Alice submits her jobs they fill up all the queues for 100%  
utilization.  Then Bob wants to run his jobs.  If Alice's jobs are  
short, then the share tree policy would quickly balance out the  
resource usage to 50-50.  But if Alice's jobs run for days, then Bob  
is stuck waiting.  Alice and Bob would prefer if Alice's job was just  
terminated (or checkpointed) and rescheduled.

The only solution that I can think of is to create two queues for  
every host, one queue for Alice and one for Bob.  On 50% of the hosts  
the Alice queue will be subordinate to Bob.  Vice versa on the other  
half.  But this requires a lot of manual queue configuration as the  
number of cluster owners increases.  It would be nice if there were  
some more general scheme like the share tree.  In other words, I  
would like the share tree to effect preemption policy.  Any ideas?

Thanks in advance.

