[GE users] Re: "use it or lose it" share tree scheduling

Reuti reuti at staff.uni-marburg.de
Thu Sep 13 12:51:23 BST 2007


Hi,

Am 13.09.2007 um 00:14 schrieb Iwona Sakrejda:

> Hi,
>
> I came back to an old issue
>
> Rayson Ho wrote:
>> OK, now with the correct option name, I can google for the manpage:
>>
>> sge_conf(5):
>>       SHARETREE_RESERVED_USAGE
>>              If  this  parameter  is set to true, reserved usage  
>> is taken for
>>              the Grid Engine  share  tree  consumption  instead   
>> of  measured
>>              usage.
>>
>> So it should do what you want...
> Is the reserved usage just the wallclock or the time the job was  
> requesting?
>
> And If a job does not specify any wallclock limit, is this a limit  
> for a queue?
>
> I want to take into account the wallclock time, but only the time  
> that was actually used.
> The man page is not quite clear about it...

the CPU/MEM usage will be accounted to be 100% of the requested one,  
even if e.g. your parallel job is only using one instead of four  
threads.

-- Reuti


> Thank You,
>
> Iwona
>
>
>
>>
>> Rayson
>>
>>
>> On 6/21/07, Iwona Sakrejda <isakrejda at lbl.gov> wrote:
>>> I looked at that entry and seems to me that it refers to proper
>>> accounting of wall clock
>>> and not substituting wall clock for CPU in the priority  
>>> calculation .......
>>> The issue is not with charging, but what is taken into account when
>>> priorities based on shares
>>> and usage are calculated.....
>>>
>>> Iwona
>>>
>>> >
>>> >
>>> >>
>>> >>> Another problem I am having is that array jobs seem to be
>>> >>>
>>> >> overcharged
>>> >>
>>> >>> when the usage is calculated (could you point me to the
>>> >>>
>>> >> section of code that
>>> >>
>>> >>> deals with it/ I'll be happy to read it). Looks like each
>>> >>>
>>> >> array job gets
>>> >>
>>> >>> the CPU usage of the whole array. Array jobs are very
>>> >>>
>>> >> helpful but users are
>>> >>
>>> >>> fleeing from them in droves.....
>>> >>>
>>> >> How to reproduce it?? Is it a parallel or serial job??
>>> >>
>>> >> The CPU usage is collected by the execds on each node... and
>>> >> then sent
>>> >> to the qmaster before it gets written to the accounting file.
>>> >>
>>> >> Rayson
>>> >>
>>> >>
>>> >>
>>> >>> Thank You,
>>> >>>
>>> >>> iwona
>>> >>>
>>> >>>
>>> >>>
>>> >>> Daniel Templeton wrote:
>>> >>>
>>> >>>> Ryan,
>>> >>>>
>>> >>>> Interesting point.  The reason for the behavior you're
>>> >>>>
>>> >> seeing is that
>>> >>
>>> >>>> if you set the halflife_decay_list to all -1's, the share
>>> >>>>
>>> >> tree usage
>>> >>
>>> >>>> is only affected by jobs that are currently running.  The
>>> >>>>
>>> >> only data
>>> >>
>>> >>>> the system has to go on is the accumulated resource usage
>>> >>>>
>>> >> of the
>>> >>
>>> >>>> currently running jobs.  Hence, user A with his really
>>> >>>>
>>> >> long-running
>>> >>
>>> >>>> job gets penalized, while user B who is actually getting
>>> >>>>
>>> >> more the
>>> >>
>>> >>>> resources is forgiven his sins because his jobs don't hang
>>> >>>>
>>> >> around long
>>> >>
>>> >>>> enough to count against him.  Perhaps not exactly
>>> >>>>
>>> >> intuitive from
>>> >>
>>> >>>> reading the docs, but it's all there in the source code.
>>> >>>>
>>> >> ;)
>>> >>
>>> >>>> Let's talk for a second about how you would fix this
>>> >>>>
>>> >> issue.  Given
>>> >>
>>> >>>> that with halflife_decay_list as -1, the scheduler can
>>> >>>>
>>> >> only use
>>> >>
>>> >>>> information from running jobs, how would you look at a
>>> >>>>
>>> >> snapshot of the
>>> >>
>>> >>>> running job list and decide how to assign priorities?  You
>>> >>>>
>>> >> implied
>>> >>
>>> >>>> that ignoring the accumulated resource usage would be
>>> >>>>
>>> >> better, but if
>>> >>
>>> >>>> you ignore that, what have you got?  Even if you were to
>>> >>>>
>>> >> take, say, a
>>> >>
>>> >>>> 1 second sampling on the jobs' usage, your numbers would
>>> >>>>
>>> >> still be far
>>> >>
>>> >>>> from accurate, as the jobs' will most likely not have
>>> >>>>
>>> >> uniform resource
>>> >>
>>> >>>> usage throughout their lifetimes.  My point is not that
>>> >>>>
>>> >> the Grid
>>> >>
>>> >>>> Engine behavior in this case is optimal.  My point is only
>>> >>>>
>>> >> that I
>>> >>
>>> >>>> don't see that there is an optimal solution, so it's a
>>> >>>>
>>> >> matter of
>>> >>
>>> >>>> choosing your shortcomings.
>>> >>>>
>>> >>>> Let me ask the obvious question.  Have you considered
>>> >>>>
>>> >> using the
>>> >>
>>> >>>> functional policy?  It is what you would expect the share
>>> >>>>
>>> >> tree to be
>>> >>
>>> >>>> if it were flat and had hdl set to -1.  Another option
>>> >>>>
>>> >> might be to use
>>> >>
>>> >>>> a halflife_decay_list with a very fast decay rate.  That
>>> >>>>
>>> >> may come
>>> >>
>>> >>>> closer to approximating what you're trying to do than
>>> >>>>
>>> >> setting it to -1.
>>> >>
>>> >>>> Daniel
>>> >>>>
>>> >>>>
>>> >>>>> Date: Thu, 21 Jun 2007 09:09:47 -0400
>>> >>>>> From: Ryan Thomas <Ryan.Thomas at nuance.com>
>>> >>>>> Subject: "use it or lose it" share tree scheduling
>>> >>>>>
>>> >>>>>   It seems from reading the docs that if the
>>> >>>>>
>>> >> halflife_decay_list
>>> >>
>>> >>>>> elements
>>> >>>>> are set to -1 that only the running jobs are used in
>>> >>>>>
>>> >> usage calculation.
>>> >>
>>> >>>>> This seems to imply that it's possible to implement a
>>> >>>>>
>>> >> "use it or lose
>>> >>
>>> >>>>> it" share tree policy where if any entity in the share
>>> >>>>>
>>> >> tree isn't
>>> >>
>>> >>>>> currently using its resources that they will have no
>>> >>>>>
>>> >> future claim on
>>> >>
>>> >>>>> them.  I think that this is a fairly intuitify and
>>> >>>>>
>>> >> important scheduling
>>> >>
>>> >>>>> policy that should be easy to implement.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> I've tried implementing this and found that it's not that
>>> >>>>>
>>> >> simple by
>>> >>
>>> >>>>> reading the code.  The problem is that current usage for
>>> >>>>>
>>> >> a job is
>>> >>
>>> >>>>> defined to be the accumulation of all resources consumed
>>> >>>>>
>>> >> by that job
>>> >>
>>> >>>>> over it's entire run.  If all jobs were approximately the
>>> >>>>>
>>> >> same in their
>>> >>
>>> >>>>> resource usage then there would be no problem.  In the
>>> >>>>>
>>> >> case that there
>>> >>
>>> >>>>> are wide variations in job length then very strange
>>> >>>>>
>>> >> scheduling results
>>> >>
>>> >>>>> occur.
>>> >>>>>
>>> >>>>>
>>> >>>>> Consider the simple example of 2 users who are configured
>>> >>>>>
>>> >> in a share
>>> >>
>>> >>>>> tree to each get 50% of the cpu resources on a 10 node
>>> >>>>>
>>> >> grid.  User A
>>> >>
>>> >>>>> always runs jobs that take 100000 seconds while user B's
>>> >>>>>
>>> >> jobs only take
>>> >>
>>> >>>>> 10 seconds.  If we assume that A and B have enough jobs
>>> >>>>>
>>> >> queued up to
>>> >>
>>> >>>>> keep the entire grid busy for a very long time, then the
>>> >>>>>
>>> >> scheduler will
>>> >>
>>> >>>>> fairly quickly reach a steady-state where user A can only
>>> >>>>>
>>> >> run 1 job
>>> >>
>>> >>>>> while user B gets 9 machines on the grid.  The problem is
>>> >>>>>
>>> >> that user B's
>>> >>
>>> >>>>> total usage in this case can never exceed 90 because the
>>> >>>>>
>>> >> longest his
>>> >>
>>> >>>>> jobs run is 10 seconds and he can get 9 machines on the
>>> >>>>>
>>> >> grid.  User A's
>>> >>
>>> >>>>> usage reaches 90 when only 90 seconds have passed and he
>>> >>>>>
>>> >> has to wait
>>> >>
>>> >>>>> another 100000-90 seconds until his usage gets down below
>>> >>>>>
>>> >> user A's so
>>> >>
>>> >>>>> that he can get his next job scheduled.  This is very far
>>> >>>>>
>>> >> from a
>>> >>
>>> >>>>> 50/50
>>> >>>>> grid split that was specified in the share tree.
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >  
>>> -------------------------------------------------------------------- 
>>> -
>>> >
>>> >>>> To unsubscribe, e-mail:
>>> >>>>
>>> >> users-unsubscribe at gridengine.sunsource.net
>>> >>
>>> >>>> For additional commands, e-mail:
>>> >>>>
>>> >> users-help at gridengine.sunsource.net
>>> >>
>>> >>>
>>> >  
>>> -------------------------------------------------------------------- 
>>> -
>>> >
>>> >>> To unsubscribe, e-mail:
>>> >>>
>>> >> users-unsubscribe at gridengine.sunsource.net
>>> >>
>>> >>> For additional commands, e-mail:
>>> >>>
>>> >> users-help at gridengine.sunsource.net
>>> >>
>>> >>>
>>> >>
>>> >  
>>> -------------------------------------------------------------------- 
>>> -
>>> >
>>> >> To unsubscribe, e-mail:
>>> >> users-unsubscribe at gridengine.sunsource.net
>>> >> For additional commands, e-mail:
>>> >> users-help at gridengine.sunsource.net
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> >
>>> >  
>>> ____________________________________________________________________ 
>>> ________________
>>> > Choose the right car based on your needs.  Check out Yahoo!  
>>> Autos new Car Finder tool.
>>> > http://autos.yahoo.com/carfinder/
>>> >
>>> >  
>>> -------------------------------------------------------------------- 
>>> -
>>> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> > For additional commands, e-mail: users- 
>>> help at gridengine.sunsource.net
>>> >
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list