[GE users] Short/Medium/Long Queue

Steve Pittard wsp at emory.edu
Thu May 25 21:55:37 BST 2006


Hi, I've been running fairly well on a SGE setup that has permitted
three departments to share the cluster. Users are getting immediate
access to the nodes that they have purchased and in periods of 
inactivity,
other users can use these nodes. In general it has been quite good. At 
first
we were seeing a clear distribution of long running jobs and short 
running jobs.
I set up an express queue to expedite fast running jobs. I wasn't so 
concerned
that a job which runs for say a month , is interrupted many times for 
short periods.

However now I'm starting to see more medium jobs. So the types of job 
lengths I'm seeing
are as follows:

50% of all jobs are short running (< 1 day in duration, usually half a 
day)
30% of all jobs are medium (1 to 2 weeks)
20% of all jobs are long (3 weeks and above)

One user (UserA) is responsible for all the short running jobs (at 
least at this point).
UserB typically submits nothing but medium jobs. UserC typically 
submits long jobs
but also some medium jobs. Both UserA and UserC own an independent 
subset of the nodes
and expect immediate access to the subset whenever the submit any type 
of job. (I'm
using the subordinate queue mechanism to accomplish this). They each 
have their own
queues to get immediate access on their nodes. As mentioned I have an 
express queue setup such
that all other queues are subordinate, which keeps the users with quick 
running jobs happy. The
users with  long and medium jobs don't really mind that much if their 
jobs are interrupted for
short periods of time.

Now the tricky part is what to do about the medium and long jobs. Those 
with long running jobs
aren't at all happy if their jobs became subordinate to the medium jobs 
since that
would require them to accept suspensions of up to 2 or 3 weeks. The 
users with medium jobs
don't want their jobs to wait in line behind a long running job. So I 
was thinking about a
  "short","medium","long" queue setup. short jobs run ahead of all other 
jobs.
Long jobs would always run (as long as there are no short jobs), and 
medium jobs would run
always (in absence of short jobs) but perhaps at a higher priority than 
the long
running jobs. Of course this is just my first cut thinking about this.

So My goals are this:

* Get short jobs processed above all else (the express queue seems to 
handle this)

* Medium Jobs and Long jobs can run on the same hosts yet perhaps long 
running jobs take a
back seat in terms of priority. I don't want to have a medium job 
waiting for a long job to
complete. Nor do I want a long job suspended or displaced for a medium 
job.

* UserA gets immediate access to his nodes - I've set this up by giving 
them a queue that
spans all their nodes so they can submit to it directly. Or they can 
also use the short run queue
which gives them better access.

* UserC ,who typically submits long jobs also has a personal queue 
which guarantees immediate access
but also wants to use as many hosts as possible. UserC doesn't mind if 
his jobs are many times suspended
by short running jobs. But UserC doesn't want his jobs to be suspended 
or pre-empted by medium running
jobs. Sharing a node is fine though even if his jobs are running at a 
lower priority. (Of course on the nodes
that he owns he would like top priority always, which might require 
direct submission to their personal queue)

* UserB who submits medium jobs does not "own" any nodes but is happy 
to get access to all nodes not owned
by the other users as well as those that aren't use at the time of 
submission. UserB doesn't mind of his
jobs are many times suspended by short running jobs. UserB is concerned 
that his jobs will have
to wait until the long running jobs complete to get access. He doesn't 
mind sharing a node with the long
running job and perhaps run at a higher priority on the same node since 
his jobs can complete faster.

* I would like to get something working that enabled me to talk UserA 
and UserC out of needing immediate
access to their nodes. I've explained the sharing concept wherein 
departments can get shares that insure,
over time, that certain use proportions are guaranteed (see Chris 
Dadigan's document on this). But the "over
time" thing is a big problem. The way the user sees it is that they 
paid for the nodes so they want
immediate access. Ideally I wish to setup a short,medium,long situation 
that is good enough such that this
isn't such a concern for them. That is they don't feel uneasy if 
someone else is running on "their" nodes
because the queuing setup is reasonable, fair, and can be overridden in 
times of urgency (which it obviously
can be). In any case I need to come up with something to make the 
medium and long running jobs not appear to be so
competitive. Sorry for what might be a somewhat rambling description 
but I've seen others here with similar
queuing environments so I thought I would  see what others are doing.

Regards, Steve wsp at emory.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list