[GE users] Sort by sequence number question

Paul MacInnis macinnis at dal.ca
Mon Jul 16 18:31:33 BST 2007


On Wed, 11 Jul 2007, Paul MacInnis wrote:

> On Wed, 11 Jul 2007, [iso-8859-1] L?nroth Erik wrote:
> 
> > yes, it is set. Still no luck on this.
> > 
> > The only way I can force the damn slaves off the MASTER node, is to remove the requested "PE" explicitly from the nodes. This is not what I would want, but I just can't make it happen. It simply ignores my sequence number alltogether. I have recreated all queues and restarted the qmaster and scheduler, but no luck whatsoever.
> > 
> > Is there something else affecting the effect of "sequence number" outside of the general queue configuration and the cluster config?
> 
> I would like to add our experience to this discussion.
> 
> We recently switched from SGE5.3 to 6.1.  We have 1G, 2G and 4G nodes in
> our cluster.  If a job doesn't specify special memory requirements
> we want it scheduled to the smallest memory machine available.
> 
> For 4 years with SGE5.3 this worked well.  We assigned sequence number
> 1965 to the 1G nodes, 2965 to the 2G nodes and 4965 to the 4G nodes.
> 
> With SGE6.1 we defined our queues with
> seq_no  1965,[@2g.hg=2965],[@4g.hg=4965]
> 
> @2g.hg being the 2G host group nodes and @4g.hg being the 4G nodes.
> 
> With qconf -msconf we defined:
> queue_sort_method     seqno
> 
> qstat -F presents the queues correctly ordered by this seqno.  However
> jobs are being scheduled to 2G and 4G nodes when there are 1G nodes
> available!
> 
> This never happened in SGE5.3!
> 
> It seems that in 6.1 either
> 1. "queue_sort_method  seqno" isn't working for queue selection or
> 2. there is some other queue selection criteria that overrides
>    "queue_sort_method  seqno"
> 
> Any thoughts?
> 
> Paul

Here's what seems to be happening.

For serial jobs we have 2 cluster queues:  ser.q    bg.q

ser.q is the main serial queue; bg.q (priority 19) is meant to be used
if only when the load on a node (load_avg and mem_used) is unexpectedly
light.  Generally same nodes are assigned to each cluster queue.

seq.q uses seqno 1965, 2965 and 4965 for its 1G, 2G and 4G nodes.

bg.q uses seqno 2969 and 4969 for its 2G and 4G nodes (no 1G nodes).

The intention is that when a serial job appears nodes would be considered
in this order:

1G seq.q,  2G seq.q,  2G bg.q,  4G seq.q  4G bg.q

However what's happening seems to be this order:

1G seq.q,  2G bg.q,  4G bg.q,  2G seq.q,  4G seq.q

It seems that for scheduling cluster queues are considered first in 
alphabetical order, and then only within the cluster queue queue
instances are considered in seqno order!

qstat however presents queue instances as intended - strictly by
seqno.  

Is the solution to name our cluster queues to alphabetically match the
order we wish them considered by the scheduler?  Or is there some other
setting that we've missed? 

Paul 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list