[GE users] Re: "use it or lose it" share tree scheduling

Daniel Templeton Dan.Templeton at Sun.COM
Thu Jun 21 16:07:06 BST 2007

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


Interesting point.  The reason for the behavior you're seeing is that if 
you set the halflife_decay_list to all -1's, the share tree usage is 
only affected by jobs that are currently running.  The only data the 
system has to go on is the accumulated resource usage of the currently 
running jobs.  Hence, user A with his really long-running job gets 
penalized, while user B who is actually getting more the resources is 
forgiven his sins because his jobs don't hang around long enough to 
count against him.  Perhaps not exactly intuitive from reading the docs, 
but it's all there in the source code. ;)

Let's talk for a second about how you would fix this issue.  Given that 
with halflife_decay_list as -1, the scheduler can only use information 
from running jobs, how would you look at a snapshot of the running job 
list and decide how to assign priorities?  You implied that ignoring the 
accumulated resource usage would be better, but if you ignore that, what 
have you got?  Even if you were to take, say, a 1 second sampling on the 
jobs' usage, your numbers would still be far from accurate, as the jobs' 
will most likely not have uniform resource usage throughout their 
lifetimes.  My point is not that the Grid Engine behavior in this case 
is optimal.  My point is only that I don't see that there is an optimal 
solution, so it's a matter of choosing your shortcomings.

Let me ask the obvious question.  Have you considered using the 
functional policy?  It is what you would expect the share tree to be if 
it were flat and had hdl set to -1.  Another option might be to use a 
halflife_decay_list with a very fast decay rate.  That may come closer 
to approximating what you're trying to do than setting it to -1.


> Date: Thu, 21 Jun 2007 09:09:47 -0400
> From: Ryan Thomas <Ryan.Thomas at nuance.com>
> Subject: "use it or lose it" share tree scheduling
> It seems from reading the docs that if the halflife_decay_list elements
> are set to -1 that only the running jobs are used in usage calculation.
> This seems to imply that it's possible to implement a "use it or lose
> it" share tree policy where if any entity in the share tree isn't
> currently using its resources that they will have no future claim on
> them.  I think that this is a fairly intuitify and important scheduling
> policy that should be easy to implement.
> I've tried implementing this and found that it's not that simple by
> reading the code.  The problem is that current usage for a job is
> defined to be the accumulation of all resources consumed by that job
> over it's entire run.  If all jobs were approximately the same in their
> resource usage then there would be no problem.  In the case that there
> are wide variations in job length then very strange scheduling results
> occur.  
> Consider the simple example of 2 users who are configured in a share
> tree to each get 50% of the cpu resources on a 10 node grid.  User A
> always runs jobs that take 100000 seconds while user B's jobs only take
> 10 seconds.  If we assume that A and B have enough jobs queued up to
> keep the entire grid busy for a very long time, then the scheduler will
> fairly quickly reach a steady-state where user A can only run 1 job
> while user B gets 9 machines on the grid.  The problem is that user B's
> total usage in this case can never exceed 90 because the longest his
> jobs run is 10 seconds and he can get 9 machines on the grid.  User A's
> usage reaches 90 when only 90 seconds have passed and he has to wait
> another 100000-90 seconds until his usage gets down below user A's so
> that he can get his next job scheduled.  This is very far from a 50/50	
> grid split that was specified in the share tree.

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list