[GE users] Re: "use it or lose it" share tree scheduling

Rayson Ho rayrayson at gmail.com
Thu Jun 21 21:07:56 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

OK, now with the correct option name, I can google for the manpage:

sge_conf(5):
       SHARETREE_RESERVED_USAGE
              If  this  parameter  is set to true, reserved usage is taken for
              the Grid Engine  share  tree  consumption  instead  of  measured
              usage.

So it should do what you want...

Rayson


On 6/21/07, Iwona Sakrejda <isakrejda at lbl.gov> wrote:
> I looked at that entry and seems to me that it refers to proper
> accounting of wall clock
> and not substituting wall clock for CPU in the priority calculation .......
> The issue is not with charging, but what is taken into account when
> priorities based on shares
> and usage are calculated.....
>
> Iwona
>
> >
> >
> >>
> >>> Another problem I am having is that array jobs seem to be
> >>>
> >> overcharged
> >>
> >>> when the usage is calculated (could you point me to the
> >>>
> >> section of code that
> >>
> >>> deals with it/ I'll be happy to read it). Looks like each
> >>>
> >> array job gets
> >>
> >>> the CPU usage of the whole array. Array jobs are very
> >>>
> >> helpful but users are
> >>
> >>> fleeing from them in droves.....
> >>>
> >> How to reproduce it?? Is it a parallel or serial job??
> >>
> >> The CPU usage is collected by the execds on each node... and
> >> then sent
> >> to the qmaster before it gets written to the accounting file.
> >>
> >> Rayson
> >>
> >>
> >>
> >>> Thank You,
> >>>
> >>> iwona
> >>>
> >>>
> >>>
> >>> Daniel Templeton wrote:
> >>>
> >>>> Ryan,
> >>>>
> >>>> Interesting point.  The reason for the behavior you're
> >>>>
> >> seeing is that
> >>
> >>>> if you set the halflife_decay_list to all -1's, the share
> >>>>
> >> tree usage
> >>
> >>>> is only affected by jobs that are currently running.  The
> >>>>
> >> only data
> >>
> >>>> the system has to go on is the accumulated resource usage
> >>>>
> >> of the
> >>
> >>>> currently running jobs.  Hence, user A with his really
> >>>>
> >> long-running
> >>
> >>>> job gets penalized, while user B who is actually getting
> >>>>
> >> more the
> >>
> >>>> resources is forgiven his sins because his jobs don't hang
> >>>>
> >> around long
> >>
> >>>> enough to count against him.  Perhaps not exactly
> >>>>
> >> intuitive from
> >>
> >>>> reading the docs, but it's all there in the source code.
> >>>>
> >> ;)
> >>
> >>>> Let's talk for a second about how you would fix this
> >>>>
> >> issue.  Given
> >>
> >>>> that with halflife_decay_list as -1, the scheduler can
> >>>>
> >> only use
> >>
> >>>> information from running jobs, how would you look at a
> >>>>
> >> snapshot of the
> >>
> >>>> running job list and decide how to assign priorities?  You
> >>>>
> >> implied
> >>
> >>>> that ignoring the accumulated resource usage would be
> >>>>
> >> better, but if
> >>
> >>>> you ignore that, what have you got?  Even if you were to
> >>>>
> >> take, say, a
> >>
> >>>> 1 second sampling on the jobs' usage, your numbers would
> >>>>
> >> still be far
> >>
> >>>> from accurate, as the jobs' will most likely not have
> >>>>
> >> uniform resource
> >>
> >>>> usage throughout their lifetimes.  My point is not that
> >>>>
> >> the Grid
> >>
> >>>> Engine behavior in this case is optimal.  My point is only
> >>>>
> >> that I
> >>
> >>>> don't see that there is an optimal solution, so it's a
> >>>>
> >> matter of
> >>
> >>>> choosing your shortcomings.
> >>>>
> >>>> Let me ask the obvious question.  Have you considered
> >>>>
> >> using the
> >>
> >>>> functional policy?  It is what you would expect the share
> >>>>
> >> tree to be
> >>
> >>>> if it were flat and had hdl set to -1.  Another option
> >>>>
> >> might be to use
> >>
> >>>> a halflife_decay_list with a very fast decay rate.  That
> >>>>
> >> may come
> >>
> >>>> closer to approximating what you're trying to do than
> >>>>
> >> setting it to -1.
> >>
> >>>> Daniel
> >>>>
> >>>>
> >>>>> Date: Thu, 21 Jun 2007 09:09:47 -0400
> >>>>> From: Ryan Thomas <Ryan.Thomas at nuance.com>
> >>>>> Subject: "use it or lose it" share tree scheduling
> >>>>>
> >>>>>   It seems from reading the docs that if the
> >>>>>
> >> halflife_decay_list
> >>
> >>>>> elements
> >>>>> are set to -1 that only the running jobs are used in
> >>>>>
> >> usage calculation.
> >>
> >>>>> This seems to imply that it's possible to implement a
> >>>>>
> >> "use it or lose
> >>
> >>>>> it" share tree policy where if any entity in the share
> >>>>>
> >> tree isn't
> >>
> >>>>> currently using its resources that they will have no
> >>>>>
> >> future claim on
> >>
> >>>>> them.  I think that this is a fairly intuitify and
> >>>>>
> >> important scheduling
> >>
> >>>>> policy that should be easy to implement.
> >>>>>
> >>>>>
> >>>>>
> >>>>> I've tried implementing this and found that it's not that
> >>>>>
> >> simple by
> >>
> >>>>> reading the code.  The problem is that current usage for
> >>>>>
> >> a job is
> >>
> >>>>> defined to be the accumulation of all resources consumed
> >>>>>
> >> by that job
> >>
> >>>>> over it's entire run.  If all jobs were approximately the
> >>>>>
> >> same in their
> >>
> >>>>> resource usage then there would be no problem.  In the
> >>>>>
> >> case that there
> >>
> >>>>> are wide variations in job length then very strange
> >>>>>
> >> scheduling results
> >>
> >>>>> occur.
> >>>>>
> >>>>>
> >>>>> Consider the simple example of 2 users who are configured
> >>>>>
> >> in a share
> >>
> >>>>> tree to each get 50% of the cpu resources on a 10 node
> >>>>>
> >> grid.  User A
> >>
> >>>>> always runs jobs that take 100000 seconds while user B's
> >>>>>
> >> jobs only take
> >>
> >>>>> 10 seconds.  If we assume that A and B have enough jobs
> >>>>>
> >> queued up to
> >>
> >>>>> keep the entire grid busy for a very long time, then the
> >>>>>
> >> scheduler will
> >>
> >>>>> fairly quickly reach a steady-state where user A can only
> >>>>>
> >> run 1 job
> >>
> >>>>> while user B gets 9 machines on the grid.  The problem is
> >>>>>
> >> that user B's
> >>
> >>>>> total usage in this case can never exceed 90 because the
> >>>>>
> >> longest his
> >>
> >>>>> jobs run is 10 seconds and he can get 9 machines on the
> >>>>>
> >> grid.  User A's
> >>
> >>>>> usage reaches 90 when only 90 seconds have passed and he
> >>>>>
> >> has to wait
> >>
> >>>>> another 100000-90 seconds until his usage gets down below
> >>>>>
> >> user A's so
> >>
> >>>>> that he can get his next job scheduled.  This is very far
> >>>>>
> >> from a
> >>
> >>>>> 50/50
> >>>>> grid split that was specified in the share tree.
> >>>>>
> >>>>>
> >>>>
> > ---------------------------------------------------------------------
> >
> >>>> To unsubscribe, e-mail:
> >>>>
> >> users-unsubscribe at gridengine.sunsource.net
> >>
> >>>> For additional commands, e-mail:
> >>>>
> >> users-help at gridengine.sunsource.net
> >>
> >>>
> > ---------------------------------------------------------------------
> >
> >>> To unsubscribe, e-mail:
> >>>
> >> users-unsubscribe at gridengine.sunsource.net
> >>
> >>> For additional commands, e-mail:
> >>>
> >> users-help at gridengine.sunsource.net
> >>
> >>>
> >>
> > ---------------------------------------------------------------------
> >
> >> To unsubscribe, e-mail:
> >> users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail:
> >> users-help at gridengine.sunsource.net
> >>
> >>
> >>
> >
> >
> >
> >
> > ____________________________________________________________________________________
> > Choose the right car based on your needs.  Check out Yahoo! Autos new Car Finder tool.
> > http://autos.yahoo.com/carfinder/
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list