[GE users] Need To Be Pointed in Right Direction

craffi dag at sonsorol.org
Wed Nov 5 19:55:50 GMT 2008


Jon (sorry for messing up the spelling earlier)

On Nov 5, 2008, at 2:40 PM, Jon Forrest wrote:

> craffi wrote:
>
>> The Functional Policy has no recording of past resource usage so if
>> you want prior usage to be taken into consideration you need to look
>> at the "Share Tree" policy. The Share Tree is the policy that can
>> "remember" and prioritize/penalize groups and individuals based on
>> past usage of a defined resource.
>
> Let's say I only care about how many jobs users are running
> at the time a job is submitted. In other words, I don't really
> care about any prior usage other than what's going on when
> a job is submitted. How does that change things?

If accounting for past usage is not an issue then you are back in  
Functional Policy land. Using functional policy you can carve up the  
cluster arbitrarily among users, departments and projects.

Some people do a percentage based method (popular when different  
departments have paid different amounts for nodes and other similar  
shared-owner situations)

Others do the popular "fairshare by user" which treats every user  
equally. If you are interested in a trivial to setup "fairshare by  
user" policy then check out these links:

Video screencast: http://www.screencast.com/t/Qb6tBziDcTU
Article: http://gridengine.info/2006/01/17/easy-setup-of-equal-user-fairshare-policy

My default recommendation for new SGE admins is to run with fairshare- 
by-user for a few weeks or months until you are able to accurately  
characterize your real-world resource allocation requirements. Once  
real users and jobs have been running for a while you'll have an idea  
of what rough spots you want to change or what SGE knobs you want to  
tweak.

Since you specifically mentioned "how many jobs users are running at a  
time" I think you would also be very very interested in SGE Resource  
Quotas -- a feature that showed up in SGE 6.1 and is already very  
powerful and flexible. Quotas are covered in the Sun wiki and people  
are tracking common use-cases here:

http://wiki.gridengine.info/wiki/index.php/RQS_Common_Uses

Resource Quotas are great if you want to put some other throttle or  
limit on top of a more broader/general resource allocation or  
scheduling policy.


The single most important thing to understand about SGE and resource  
allocation is this:

- by default, SGE *never* messes with running jobs. The way that  
policies are implemented is via the scheduler constantly sorting and  
re-sorting the jobs that are waiting in "qw" pending state. You will  
know your policies are working when you see the order entry of the  
pending jobs changing with the times. The big and obvious problem is  
someone loading up your entire cluster with tasks that run for days or  
weeks -- this will really mess up your policy implementation and this  
is why people use quotas, multiple-queues with hard resource limits  
etc. etc.

>
>
>> Probably poking around http://wikis.sun.com/display/GridEngine/Managing+Policies
>>  will help you with the basics
>
> I will gladly look at this.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88141

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list