[GE users] Sort by sequence number question

Paul MacInnis macinnis at dal.ca
Wed Jul 11 13:59:47 BST 2007


On Wed, 11 Jul 2007, Reuti wrote:

> Am 11.07.2007 um 13:42 schrieb Paul MacInnis:
> 
> > On Wed, 11 Jul 2007, [iso-8859-1] L?nroth Erik wrote:
> >
> >> yes, it is set. Still no luck on this.
> >>
> >> The only way I can force the damn slaves off the MASTER node, is  
> >> to remove the requested "PE" explicitly from the nodes. This is  
> >> not what I would want, but I just can't make it happen. It simply  
> >> ignores my sequence number alltogether. I have recreated all  
> >> queues and restarted the qmaster and scheduler, but no luck  
> >> whatsoever.
> >>
> >> Is there something else affecting the effect of "sequence number"  
> >> outside of the general queue configuration and the cluster config?
> >
> > I would like to add our experience to this discussion.
> >
> > We recently switched from SGE5.3 to 6.1.  We have 1G, 2G and 4G  
> > nodes in
> > our cluster.  If a job doesn't specify special memory requirements
> > we want it scheduled to the smallest memory machine available.
> >
> > For 4 years with SGE5.3 this worked well.  We assigned sequence number
> > 1965 to the 1G nodes, 2965 to the 2G nodes and 4965 to the 4G nodes.
> >
> > With SGE6.1 we defined our queues with
> > seq_no  1965,[@2g.hg=2965],[@4g.hg=4965]
> >
> > @2g.hg being the 2G host group nodes and @4g.hg being the 4G nodes.
> >
> > With qconf -msconf we defined:
> > queue_sort_method     seqno
> >
> > qstat -F presents the queues correctly ordered by this seqno.  However
> > jobs are being scheduled to 2G and 4G nodes when there are 1G nodes
> > available!
> >
> > This never happened in SGE5.3!
> >
> > It seems that in 6.1 either
> > 1. "queue_sort_method  seqno" isn't working for queue selection or
> > 2. there is some other queue selection criteria that overrides
> >    "queue_sort_method  seqno"
> 
> I don't see this with our setup. Do you have any default requests  
> (either in the complex definition (qconf -sc) or the sge_request file)?

No.  But we followed the Admin Guide's example for "user-based equal
share" and set:

qconf -mconf
enforce_user auto
auto_user_fshare 100

qconf -msconf
weight_tickets_functional 10000

Also we defined a resource quota set thus:
master:~$  qconf -srqs
perUserSlotLimit
{
   name         perUserSlotLimit
   description  limit slots per user
   enabled      TRUE
   limit        users {*} to slots=40
}

I assume that the domain for these shares and quotas is the whole cluster
and not just a single host group ...

Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list