[GE users] Sort by sequence number question

Erik Lönroth erik.lonroth at scania.com
Tue Jul 17 08:37:02 BST 2007

    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Just when I started to get over this and return to my life, this issue
arises again. I didn't manage to solve this and I experience almost the
same as you do.

It seems the scheduler sorts things alphabetically/numerically also
within the cluster queue.

No matter how I modify the sequence number, my node: ts101-1-0 is
selected before ts101-1-1. The only way I could stop it was to remove
the PE resources from the specific nodes.

My goal is to get it like this.  [<seqno>,<node>]

master.q - [99,ts101-1-0]
short.q - [1,ts101-1-0  0,ts101-1-1] 

This way the node ts101-1-1 would be filled up before ts101-1-0 would be
- and that never happens as ts101-1-0 always fills up first (unless I
have $round_robin which is wrong for my application.)

I have banged my head into this so much and I KNOW there has to be
something wrong, somehow, somewhere...

I hope someone with more experience and knowledge will crack the nut.


On mån, 2007-07-16 at 14:31 -0300, Paul MacInnis wrote:
> On Wed, 11 Jul 2007, Paul MacInnis wrote:
> > On Wed, 11 Jul 2007, [iso-8859-1] Lnroth Erik wrote:
> > 
> > > yes, it is set. Still no luck on this.
> > > 
> > > The only way I can force the damn slaves off the MASTER node, is to remove the requested "PE" explicitly from the nodes. This is not what I would want, but I just can't make it happen. It simply ignores my sequence number alltogether. I have recreated all queues and restarted the qmaster and scheduler, but no luck whatsoever.
> > > 
> > > Is there something else affecting the effect of "sequence number" outside of the general queue configuration and the cluster config?
> > 
> > I would like to add our experience to this discussion.
> > 
> > We recently switched from SGE5.3 to 6.1.  We have 1G, 2G and 4G nodes in
> > our cluster.  If a job doesn't specify special memory requirements
> > we want it scheduled to the smallest memory machine available.
> > 
> > For 4 years with SGE5.3 this worked well.  We assigned sequence number
> > 1965 to the 1G nodes, 2965 to the 2G nodes and 4965 to the 4G nodes.
> > 
> > With SGE6.1 we defined our queues with
> > seq_no  1965,[@2g.hg=2965],[@4g.hg=4965]
> > 
> > @2g.hg being the 2G host group nodes and @4g.hg being the 4G nodes.
> > 
> > With qconf -msconf we defined:
> > queue_sort_method     seqno
> > 
> > qstat -F presents the queues correctly ordered by this seqno.  However
> > jobs are being scheduled to 2G and 4G nodes when there are 1G nodes
> > available!
> > 
> > This never happened in SGE5.3!
> > 
> > It seems that in 6.1 either
> > 1. "queue_sort_method  seqno" isn't working for queue selection or
> > 2. there is some other queue selection criteria that overrides
> >    "queue_sort_method  seqno"
> > 
> > Any thoughts?
> > 
> > Paul
> Here's what seems to be happening.
> For serial jobs we have 2 cluster queues:  ser.q    bg.q
> ser.q is the main serial queue; bg.q (priority 19) is meant to be used
> if only when the load on a node (load_avg and mem_used) is unexpectedly
> light.  Generally same nodes are assigned to each cluster queue.
> seq.q uses seqno 1965, 2965 and 4965 for its 1G, 2G and 4G nodes.
> bg.q uses seqno 2969 and 4969 for its 2G and 4G nodes (no 1G nodes).
> The intention is that when a serial job appears nodes would be considered
> in this order:
> 1G seq.q,  2G seq.q,  2G bg.q,  4G seq.q  4G bg.q
> However what's happening seems to be this order:
> 1G seq.q,  2G bg.q,  4G bg.q,  2G seq.q,  4G seq.q
> It seems that for scheduling cluster queues are considered first in 
> alphabetical order, and then only within the cluster queue queue
> instances are considered in seqno order!
> qstat however presents queue instances as intended - strictly by
> seqno.  
> Is the solution to name our cluster queues to alphabetically match the
> order we wish them considered by the scheduler?  Or is there some other
> setting that we've missed? 
> Paul 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list