[GE users] Does nice always work when determining which waiti ng job to assign to a node first?

Marconnet, James E Mr /Computer Sciences Corporation james.marconnet at smdc.army.mil
Wed Apr 13 15:21:03 BST 2005


Reuti:

After digesting your several suggestions I chose what looked like the
simplest and easiest: Using qmon, I set a default seq_no of 50 to each
primary que and a seq_no of 100 to each secondary que. I changed the
scheduler sort order to seq_no from loading. As I read it here and in the
Help, this makes the SGE scheduler first sort the waiting jobs by seq_no and
then sort the available nodes by loading; and then assign jobs to the
available nodes accordingly.

I may not be able to test these changes using the secondary ques for a while
due to a current crunch.

Thanks so much!
Jim

-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Tuesday, April 12, 2005 3:43 PM
To: users at gridengine.sunsource.net
Subject: RE: [GE users] Does nice always work when determining which waiti
ng job to assign to a node first?

Jim,

I remember the subordination from the "Running 1, 2, 3, or 4 jobs..."
thread. 
But this now looks like a different setup. 

Quoting "Marconnet, James E Mr /Computer Sciences Corporation" 
<james.marconnet at smdc.army.mil>:

> Reuti:
> 
> I'm have two technical groups, each with what we call a primary que 
> and a secondary que. So call the ques G1p, G1s, G2p, G2s. The nodes 
> are split evenly between the two groups. G1p contains the same nodes 
> as G2s, except with nice=0 or 19 respectively. And so forth. The idea 
> was to allow both

When always one queue is blocked via the subordination, then there is no
difference whether a single job on machine will run with nice=0 or nice=19 -
it will get all of the CPU time.

> groups to use as many available nodes as possible, without wasting 
> half the nodes when no one in the other group needs to run anything.
> 
> To prevent running 2 or more jobs on nodes simultaneously, G1p is 
> subordinate to G2s, and so forth, per your suggestion in another thread.
> 
> We thought that setting nice would affect the scheduling of waiting 
> jobs to nodes. Turns out it does not.

Using the nice values could achieve the requested effect in combination with
the seq_no:

- you setup already two user lists, so that each group can only run in
queues they should use I assume.

- que G1p:

hostlist @partA
priority 0
seq_no 50

- que G1s:

hostlist @partB
priority 19
seq_no 100

- que G2p:

hostlist @partB
priority 0
seq_no 50

- que G2s:

hostlist @partA
priority 19
seq_no 100


Setting the sort order to seqno will still use load balancing between queue
instances with the same seq_no.

- Jobs submitted by G1 will first fill the machines in the que G1p, if they
submit more, than G1s will be used.

- Now G2 submit some jobs, they will go first to the @partB nodes, and
select the one with the least load.

- If G2 get now some nodes where already a G1s job is running, they will get
most of the CPU time due to the different nices value for the two ques on
the same machines. Of course, this maybe a point of discussion, whether your
users will grant this always to the other group. (only options I see:
suspend the G1s job [what you don't like], or wait until the G1s job
finished, i.e. drain the G1s slot *)

- Each group can request their primary que in qsub, if they don't like to
run in the background.

Will this come close the request of your groups? - Reuti


*) This currently you want to achieve with the blocking via the
subordination I think, but depending on the policy, the schduler will first
grant a slot to a member of the wrong group after it finished. Something
like "never use G1s, if
G2 ones are waiting" - independend from any other policy - would help. I'm
thinking of a load_sensor, which will block G1s if the count of pending G2
jobs is > 0. Then of course you wouldn't need any priority/nice values at
all.

> 
> So if G1 submits a gazillion jobs to both his group's primary and 
> secondary ques, then all the cluster nodes get used very efficiently, 
> but nobody in
> G2
> gets to start even one job till all the G1 jobs begin and then at 
> least one job finishes.
> 
> Yes, we have load threshholds set (since there is some interactive 
> testing and since some nodes are dual-processor, hyperthreaded - 
> another story!), but we are not currently suspending jobs. Some users 
> have short jobs, and some have much longer jobs. So suspending jobs seemed
iffy.
> 
> Probably clear as mud, but hope it helps someone understand and 
> suggest an approach.
> 
> Jim
> 
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Tuesday, April 12, 2005 1:36 PM
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] Does nice always work when determining which 
> waiti ng job to assign to a node first?
> 
> Hi Jim,
> 
> I'm still not sure about your intended setup. You have two cluster 
> queues - one with nice=0, the other with nice=19 and any 
> load_thresholds and subordination?
> 
> One user will submit qsub -q nice0que, the other qsub -q nice19que?
> 
> Quoting "Marconnet, James E Mr /Computer Sciences Corporation" 
> <james.marconnet at smdc.army.mil>:
> 
> > Bummer, we thought the nice value would affect the order in which 
> > waiting jobs were assigned to the nodes. Apparently not so.
> > 
> > I searched the Admin Manual on seq_no, and I did not see where that 
> > could be used unless we wanted to give up sorting by load level to 
> > balance out the load on the nodes instead of filling up the first 
> > node completely, then the next one fully, etc. And it's not at all 
> > clear how this would be used anyway. Anyone able to clarify it?
> > 
> > Reading from the Admin manual: 
> > Without any administrator influence, the order is first-in-first-out 
> > (FIFO).
> > 
> > The administrator has the following means to control the job order:
> > ^A Ticket-based job priority. ....
> > ^A Urgency-based job priority. ....
> > ^A POSIX priority.....
> > 
> > Is there an easy way to tie one of these methods to the que which 
> > was specified? I don't want the user to have to specify additional 
> > options (that I have to explain and to police) other than the que if 
> > it can be helped.
> > 
> > And we'd prefer not to suspend jobs, but to let the running jobs 
> > complete before starting new jobs. Suspending jobs would wreck havoc 
> > on our completion predictions.
> 
> But you mentioned subordinated queues - they will be suspended then.
> 
> CU - Reuti
> 
> > 
> > Perhaps I just want too much!?
> > 
> > Thanks,
> > Jim
> > 
> > -----Original Message-----
> > From: Reuti [mailto:reuti at staff.uni-marburg.de]
> > Sent: Tuesday, April 12, 2005 10:40 AM
> > To: users at gridengine.sunsource.net
> > Subject: Re: [GE users] Does nice always work when determining which 
> > waiting job to assign to a node first?
> > 
> > Hi,
> > 
> > nice is not used for scheduling, but you can use a seq_no for the 
> > two queue types, to fill first the nice=0 queue. But suspending a 
> > nice=19 queue - mhh then this queue could also have just nice of 0, 
> > as it's suspended anyway, if the nice=0 queue is filled (if I got 
> > you correctly).
> > 
> > CU - Reuti
> > 
> > 
> > Marconnet, James E Mr /Computer Sciences Corporation wrote:
> > > Using 6.0u3. Had some reports yesterday that some waiting jobs 
> > > from a que with nice=0 were still waiting after some waiting jobs 
> > > from a different que with nice=19 started running on nodes 
> > > previously running jobs with nice=19. The "wronged" users 
> > > figuratively went on the warpath soon afterwards.
> > > 
> > > We are using que subordination to prevent too many jobs from 
> > > different ques from running on the same nodes at the same time, 
> > > but that works on a node-by-node basis, and it seemed to be working
OK.
> > > 
> > > Anything In particular I should know about this or to look for 
> > > settings-wise?
> > > 
> > > Thanks!
> > > Jim Marconnet

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list