[GE users] Sort by sequence number question
macinnis at dal.ca
Mon Jul 16 18:31:33 BST 2007
On Wed, 11 Jul 2007, Paul MacInnis wrote:
> On Wed, 11 Jul 2007, [iso-8859-1] L?nroth Erik wrote:
> > yes, it is set. Still no luck on this.
> > The only way I can force the damn slaves off the MASTER node, is to remove the requested "PE" explicitly from the nodes. This is not what I would want, but I just can't make it happen. It simply ignores my sequence number alltogether. I have recreated all queues and restarted the qmaster and scheduler, but no luck whatsoever.
> > Is there something else affecting the effect of "sequence number" outside of the general queue configuration and the cluster config?
> I would like to add our experience to this discussion.
> We recently switched from SGE5.3 to 6.1. We have 1G, 2G and 4G nodes in
> our cluster. If a job doesn't specify special memory requirements
> we want it scheduled to the smallest memory machine available.
> For 4 years with SGE5.3 this worked well. We assigned sequence number
> 1965 to the 1G nodes, 2965 to the 2G nodes and 4965 to the 4G nodes.
> With SGE6.1 we defined our queues with
> seq_no 1965,[@2g.hg=2965],[@4g.hg=4965]
> @2g.hg being the 2G host group nodes and @4g.hg being the 4G nodes.
> With qconf -msconf we defined:
> queue_sort_method seqno
> qstat -F presents the queues correctly ordered by this seqno. However
> jobs are being scheduled to 2G and 4G nodes when there are 1G nodes
> This never happened in SGE5.3!
> It seems that in 6.1 either
> 1. "queue_sort_method seqno" isn't working for queue selection or
> 2. there is some other queue selection criteria that overrides
> "queue_sort_method seqno"
> Any thoughts?
Here's what seems to be happening.
For serial jobs we have 2 cluster queues: ser.q bg.q
ser.q is the main serial queue; bg.q (priority 19) is meant to be used
if only when the load on a node (load_avg and mem_used) is unexpectedly
light. Generally same nodes are assigned to each cluster queue.
seq.q uses seqno 1965, 2965 and 4965 for its 1G, 2G and 4G nodes.
bg.q uses seqno 2969 and 4969 for its 2G and 4G nodes (no 1G nodes).
The intention is that when a serial job appears nodes would be considered
in this order:
1G seq.q, 2G seq.q, 2G bg.q, 4G seq.q 4G bg.q
However what's happening seems to be this order:
1G seq.q, 2G bg.q, 4G bg.q, 2G seq.q, 4G seq.q
It seems that for scheduling cluster queues are considered first in
alphabetical order, and then only within the cluster queue queue
instances are considered in seqno order!
qstat however presents queue instances as intended - strictly by
Is the solution to name our cluster queues to alphabetically match the
order we wish them considered by the scheduler? Or is there some other
setting that we've missed?
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users