[GE users] Sort by sequence number question

Paul MacInnis macinnis at dal.ca
Tue Jul 17 15:54:40 BST 2007


Hi Andreas,

On Tue, 17 Jul 2007 Andreas.Haas at Sun.COM wrote:

> Hi Paul,
> 
> On Mon, 16 Jul 2007, Paul MacInnis wrote:
> 
> >
> > Here's what seems to be happening.
> >
> > For serial jobs we have 2 cluster queues:  ser.q    bg.q
> >
> > ser.q is the main serial queue; bg.q (priority 19) is meant to be used
> > if only when the load on a node (load_avg and mem_used) is unexpectedly
> > light.  Generally same nodes are assigned to each cluster queue.
> >
> > seq.q uses seqno 1965, 2965 and 4965 for its 1G, 2G and 4G nodes.
> >
> > bg.q uses seqno 2969 and 4969 for its 2G and 4G nodes (no 1G nodes).
> >
> > The intention is that when a serial job appears nodes would be considered
> > in this order:
> >
> > 1G seq.q,  2G seq.q,  2G bg.q,  4G seq.q  4G bg.q
> >
> > However what's happening seems to be this order:
> >
> > 1G seq.q,  2G bg.q,  4G bg.q,  2G seq.q,  4G seq.q
> >
> > It seems that for scheduling cluster queues are considered first in
> > alphabetical order, and then only within the cluster queue queue
> > instances are considered in seqno order!
> >
> > qstat however presents queue instances as intended - strictly by
> > seqno.
> >
> > Is the solution to name our cluster queues to alphabetically match the
> > order we wish them considered by the scheduler?  Or is there some other
> > setting that we've missed?
> 
> I can not reproduce this. Here is my queue set-up:
> 
>     > qconf -ssconf | grep sort
>     queue_sort_method                 seqno
> 
>     > qconf -shgrp @oneG
>     group_name @oneG
>     hostlist angbor
> 
>     > qconf -shgrp @twoG
>     group_name @twoG
>     hostlist es-ergb01-01
> 
>     > qconf -shgrp @fourG
>     group_name @fourG
>     hostlist baumbart
> 
>     > qconf -sq test_ser.q | egrep "hostlist|seq|load_thre|slots"
>     hostlist              @oneG @twoG @fourG
>     seq_no                0,[@oneG=1965],[@twoG=2965],[@fourG=4965]
>     load_thresholds       NONE
>     slots                 1
> 
>     > qconf -sq test_bg.q | egrep "hostlist|seq|load_thre|slots"
>     hostlist              @twoG @fourG
>     seq_no                0,[@twoG=2969],[@fourG=4969]
>     load_thresholds       NONE
>     slots                 1
> 
> when I submit
> 
>     > qsub -t 1-5 -q 'test_*' -b y /sleep 5
>     Your job-array 528.1-5:1 ("sleep") has been submitted
> 
> I get queues filled in the order of the array task indices
> 
>     > qstat -f -q 'test_*'
>     queuename                      qtype used/tot. load_avg arch          states
>     ----------------------------------------------------------------------------
>     test_ser.q at angbor              BIP   1/1       0.04     lx24-x86
>         528 0.55500 sleep      ah114088     t     07/17/2007 16:02:19     1 1
>     ----------------------------------------------------------------------------
>     test_ser.q at es-ergb01-01        BIP   1/1       0.42     sol-sparc64
>         528 0.55500 sleep      ah114088     t     07/17/2007 16:02:19     1 2
>     ----------------------------------------------------------------------------
>     test_bg.q at es-ergb01-01         BIP   1/1       0.42     sol-sparc64
>         528 0.55500 sleep      ah114088     t     07/17/2007 16:02:19     1 3
>     ----------------------------------------------------------------------------
>     test_ser.q at baumbart            BIP   1/1       0.19     irix65
>         528 0.55500 sleep      ah114088     t     07/17/2007 16:02:19     1 4
>     ----------------------------------------------------------------------------
>     test_bg.q at baumbart             BIP   1/1       0.19     irix65
>         528 0.55500 sleep      ah114088     t     07/17/2007 16:02:19     1 5
> 
> and the same is true with plain sequential jobs
> 
>     > ntimes 5 qsub -q 'test_*' -b y /bin/sleep 5
>     Your job 534 ("sleep") has been submitted
>     Your job 535 ("sleep") has been submitted
>     Your job 536 ("sleep") has been submitted
>     Your job 537 ("sleep") has been submitted
>     Your job 538 ("sleep") has been submitted
> 
>     > qstat -f -q 'test_*'
>     queuename                      qtype used/tot. load_avg arch          states
>     ----------------------------------------------------------------------------
>     test_ser.q at angbor              BIP   1/1       0.10     lx24-x86
>         534 0.55500 sleep      ah114088     r     07/17/2007 16:07:09     1
>     ----------------------------------------------------------------------------
>     test_ser.q at es-ergb01-01        BIP   1/1       0.34     sol-sparc64
>         535 0.55500 sleep      ah114088     r     07/17/2007 16:07:09     1
>     ----------------------------------------------------------------------------
>     test_bg.q at es-ergb01-01         BIP   1/1       0.34     sol-sparc64
>         536 0.55500 sleep      ah114088     t     07/17/2007 16:07:09     1
>     ----------------------------------------------------------------------------
>     test_ser.q at baumbart            BIP   1/1       0.19     irix65
>         537 0.55500 sleep      ah114088     t     07/17/2007 16:07:09     1
>     ----------------------------------------------------------------------------
>     test_bg.q at baumbart             BIP   1/1       0.19     irix65
>         538 0.55500 sleep      ah114088     t     07/17/2007 16:07:09     1
> 
> I did this with N1GE 6.1
> 
> Could it be that jobs are submitted with -soft option as to specify some 
> preferece? Or are you using some over-sensitive load thresholds?

This is a pretty good duplication of our setup here.  However on your
"qsub" you use "-q test_*" to select a domain of queues for the
scheduler.  We don't use -q at all, the scheduler chooses cluster
queues as it pleases.  So I would claim that by default the scheduler
considers 1 cluster queue at a time to satisfy a request and only
looks at additional cluster queues if no instance satisfies. 

Try something that doesn't select queues by name, perhaps a
resource request that allows you only on these test queues.

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list