[GE users] Does nice always work when determining which waiti ng job to assign to a node first?

Charu Chaubal Charu.Chaubal at Sun.COM
Tue Apr 12 21:59:03 BST 2005


Hi Jim,

Marconnet, James E Mr /Computer Sciences Corporation wrote:
> Reuti:
> 
> I'm have two technical groups, each with what we call a primary que and a
> secondary que. So call the ques G1p, G1s, G2p, G2s. The nodes are split
> evenly between the two groups. G1p contains the same nodes as G2s, except
> with nice=0 or 19 respectively. And so forth. The idea was to allow both
> groups to use as many available nodes as possible, without wasting half the
> nodes when no one in the other group needs to run anything. 
> 
> To prevent running 2 or more jobs on nodes simultaneously, G1p is
> subordinate to G2s, and so forth, per your suggestion in another thread.
> 
> We thought that setting nice would affect the scheduling of waiting jobs to
> nodes. Turns out it does not. 
> 

It sounds like you probably want to set up Grid Engine policies to
handle this.  See, for example,
http://gridengine.sunsource.net/howto/geee.html.  Although you are using
N1GE 6, this HOWTO still applies.

It sounds like you want to have Department-based functional scheduling,
but possibly the share tree could be of interest to you too.

In general, the N1GE policies are meant to address specifically the
kinds of sharing scenarios you are describing.

Regards,
	Charu


> So if G1 submits a gazillion jobs to both his group's primary and secondary
> ques, then all the cluster nodes get used very efficiently, but nobody in G2
> gets to start even one job till all the G1 jobs begin and then at least one
> job finishes.
> 



> Yes, we have load threshholds set (since there is some interactive testing
> and since some nodes are dual-processor, hyperthreaded - another story!),
> but we are not currently suspending jobs. Some users have short jobs, and
> some have much longer jobs. So suspending jobs seemed iffy.
> 
> Probably clear as mud, but hope it helps someone understand and suggest an
> approach.
> 
> Jim
> 
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de] 
> Sent: Tuesday, April 12, 2005 1:36 PM
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] Does nice always work when determining which waiti
> ng job to assign to a node first?
> 
> Hi Jim,
> 
> I'm still not sure about your intended setup. You have two cluster queues -
> one with nice=0, the other with nice=19 and any load_thresholds and
> subordination?
> 
> One user will submit qsub -q nice0que, the other qsub -q nice19que?
> 
> Quoting "Marconnet, James E Mr /Computer Sciences Corporation" 
> <james.marconnet at smdc.army.mil>:
> 
> 
>>Bummer, we thought the nice value would affect the order in which 
>>waiting jobs were assigned to the nodes. Apparently not so.
>>
>>I searched the Admin Manual on seq_no, and I did not see where that 
>>could be used unless we wanted to give up sorting by load level to 
>>balance out the load on the nodes instead of filling up the first node 
>>completely, then the next one fully, etc. And it's not at all clear 
>>how this would be used anyway. Anyone able to clarify it?
>>
>>Reading from the Admin manual: 
>>Without any administrator influence, the order is first-in-first-out 
>>(FIFO).
>>
>>The administrator has the following means to control the job order:
>>^A Ticket-based job priority. ....
>>^A Urgency-based job priority. ....
>>^A POSIX priority.....
>>
>>Is there an easy way to tie one of these methods to the que which was 
>>specified? I don't want the user to have to specify additional options 
>>(that I have to explain and to police) other than the que if it can be 
>>helped.
>>
>>And we'd prefer not to suspend jobs, but to let the running jobs 
>>complete before starting new jobs. Suspending jobs would wreck havoc 
>>on our completion predictions.
> 
> 
> But you mentioned subordinated queues - they will be suspended then.
> 
> CU - Reuti
> 
> 
>>Perhaps I just want too much!?
>>
>>Thanks,
>>Jim
>>
>>-----Original Message-----
>>From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>Sent: Tuesday, April 12, 2005 10:40 AM
>>To: users at gridengine.sunsource.net
>>Subject: Re: [GE users] Does nice always work when determining which 
>>waiting job to assign to a node first?
>>
>>Hi,
>>
>>nice is not used for scheduling, but you can use a seq_no for the two 
>>queue types, to fill first the nice=0 queue. But suspending a nice=19 
>>queue - mhh then this queue could also have just nice of 0, as it's 
>>suspended anyway, if the nice=0 queue is filled (if I got you 
>>correctly).
>>
>>CU - Reuti
>>
>>
>>Marconnet, James E Mr /Computer Sciences Corporation wrote:
>>
>>>Using 6.0u3. Had some reports yesterday that some waiting jobs from 
>>>a que with nice=0 were still waiting after some waiting jobs from a 
>>>different que with nice=19 started running on nodes previously 
>>>running jobs with nice=19. The "wronged" users figuratively went on 
>>>the warpath soon afterwards.
>>>
>>>We are using que subordination to prevent too many jobs from 
>>>different ques from running on the same nodes at the same time, but 
>>>that works on a node-by-node basis, and it seemed to be working OK.
>>>
>>>Anything In particular I should know about this or to look for 
>>>settings-wise?
>>>
>>>Thanks!
>>>Jim Marconnet
>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

-- 
####################################################################
# Charu V. Chaubal              # Phone: (650) 786-7672 (x87672)   #
# Grid Computing Technologist   # Fax:   (650) 786-4591            #
# Sun Microsystems, Inc.        # Email: charu.chaubal at sun.com     #
####################################################################


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list