[GE users] Sort by sequence number question

Andreas.Haas at Sun.COM Andreas.Haas at Sun.COM
Tue Jul 17 15:12:22 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Paul,

On Mon, 16 Jul 2007, Paul MacInnis wrote:

> On Wed, 11 Jul 2007, Paul MacInnis wrote:
>
>> On Wed, 11 Jul 2007, [iso-8859-1] Lönroth Erik wrote:
>>
>>> yes, it is set. Still no luck on this.
>>>
>>> The only way I can force the damn slaves off the MASTER node, is to remove the requested "PE" explicitly from the nodes. This is not what I would want, but I just can't make it happen. It simply ignores my sequence number alltogether. I have recreated all queues and restarted the qmaster and scheduler, but no luck whatsoever.
>>>
>>> Is there something else affecting the effect of "sequence number" outside of the general queue configuration and the cluster config?
>>
>> I would like to add our experience to this discussion.
>>
>> We recently switched from SGE5.3 to 6.1.  We have 1G, 2G and 4G nodes in
>> our cluster.  If a job doesn't specify special memory requirements
>> we want it scheduled to the smallest memory machine available.
>>
>> For 4 years with SGE5.3 this worked well.  We assigned sequence number
>> 1965 to the 1G nodes, 2965 to the 2G nodes and 4965 to the 4G nodes.
>>
>> With SGE6.1 we defined our queues with
>> seq_no  1965,[@2g.hg=2965],[@4g.hg=4965]
>>
>> @2g.hg being the 2G host group nodes and @4g.hg being the 4G nodes.
>>
>> With qconf -msconf we defined:
>> queue_sort_method     seqno
>>
>> qstat -F presents the queues correctly ordered by this seqno.  However
>> jobs are being scheduled to 2G and 4G nodes when there are 1G nodes
>> available!
>>
>> This never happened in SGE5.3!
>>
>> It seems that in 6.1 either
>> 1. "queue_sort_method  seqno" isn't working for queue selection or
>> 2. there is some other queue selection criteria that overrides
>>    "queue_sort_method  seqno"
>>
>> Any thoughts?
>>
>> Paul
>
> Here's what seems to be happening.
>
> For serial jobs we have 2 cluster queues:  ser.q    bg.q
>
> ser.q is the main serial queue; bg.q (priority 19) is meant to be used
> if only when the load on a node (load_avg and mem_used) is unexpectedly
> light.  Generally same nodes are assigned to each cluster queue.
>
> seq.q uses seqno 1965, 2965 and 4965 for its 1G, 2G and 4G nodes.
>
> bg.q uses seqno 2969 and 4969 for its 2G and 4G nodes (no 1G nodes).
>
> The intention is that when a serial job appears nodes would be considered
> in this order:
>
> 1G seq.q,  2G seq.q,  2G bg.q,  4G seq.q  4G bg.q
>
> However what's happening seems to be this order:
>
> 1G seq.q,  2G bg.q,  4G bg.q,  2G seq.q,  4G seq.q
>
> It seems that for scheduling cluster queues are considered first in
> alphabetical order, and then only within the cluster queue queue
> instances are considered in seqno order!
>
> qstat however presents queue instances as intended - strictly by
> seqno.
>
> Is the solution to name our cluster queues to alphabetically match the
> order we wish them considered by the scheduler?  Or is there some other
> setting that we've missed?

I can not reproduce this. Here is my queue set-up:

    > qconf -ssconf | grep sort
    queue_sort_method                 seqno

    > qconf -shgrp @oneG
    group_name @oneG
    hostlist angbor

    > qconf -shgrp @twoG
    group_name @twoG
    hostlist es-ergb01-01

    > qconf -shgrp @fourG
    group_name @fourG
    hostlist baumbart

    > qconf -sq test_ser.q | egrep "hostlist|seq|load_thre|slots"
    hostlist              @oneG @twoG @fourG
    seq_no                0,[@oneG=1965],[@twoG=2965],[@fourG=4965]
    load_thresholds       NONE
    slots                 1

    > qconf -sq test_bg.q | egrep "hostlist|seq|load_thre|slots"
    hostlist              @twoG @fourG
    seq_no                0,[@twoG=2969],[@fourG=4969]
    load_thresholds       NONE
    slots                 1

when I submit

    > qsub -t 1-5 -q 'test_*' -b y /sleep 5
    Your job-array 528.1-5:1 ("sleep") has been submitted

I get queues filled in the order of the array task indices

    > qstat -f -q 'test_*'
    queuename                      qtype used/tot. load_avg arch          states
    ----------------------------------------------------------------------------
    test_ser.q at angbor              BIP   1/1       0.04     lx24-x86
        528 0.55500 sleep      ah114088     t     07/17/2007 16:02:19     1 1
    ----------------------------------------------------------------------------
    test_ser.q at es-ergb01-01        BIP   1/1       0.42     sol-sparc64
        528 0.55500 sleep      ah114088     t     07/17/2007 16:02:19     1 2
    ----------------------------------------------------------------------------
    test_bg.q at es-ergb01-01         BIP   1/1       0.42     sol-sparc64
        528 0.55500 sleep      ah114088     t     07/17/2007 16:02:19     1 3
    ----------------------------------------------------------------------------
    test_ser.q at baumbart            BIP   1/1       0.19     irix65
        528 0.55500 sleep      ah114088     t     07/17/2007 16:02:19     1 4
    ----------------------------------------------------------------------------
    test_bg.q at baumbart             BIP   1/1       0.19     irix65
        528 0.55500 sleep      ah114088     t     07/17/2007 16:02:19     1 5

and the same is true with plain sequential jobs

    > ntimes 5 qsub -q 'test_*' -b y /bin/sleep 5
    Your job 534 ("sleep") has been submitted
    Your job 535 ("sleep") has been submitted
    Your job 536 ("sleep") has been submitted
    Your job 537 ("sleep") has been submitted
    Your job 538 ("sleep") has been submitted

    > qstat -f -q 'test_*'
    queuename                      qtype used/tot. load_avg arch          states
    ----------------------------------------------------------------------------
    test_ser.q at angbor              BIP   1/1       0.10     lx24-x86
        534 0.55500 sleep      ah114088     r     07/17/2007 16:07:09     1
    ----------------------------------------------------------------------------
    test_ser.q at es-ergb01-01        BIP   1/1       0.34     sol-sparc64
        535 0.55500 sleep      ah114088     r     07/17/2007 16:07:09     1
    ----------------------------------------------------------------------------
    test_bg.q at es-ergb01-01         BIP   1/1       0.34     sol-sparc64
        536 0.55500 sleep      ah114088     t     07/17/2007 16:07:09     1
    ----------------------------------------------------------------------------
    test_ser.q at baumbart            BIP   1/1       0.19     irix65
        537 0.55500 sleep      ah114088     t     07/17/2007 16:07:09     1
    ----------------------------------------------------------------------------
    test_bg.q at baumbart             BIP   1/1       0.19     irix65
        538 0.55500 sleep      ah114088     t     07/17/2007 16:07:09     1

I did this with N1GE 6.1

Could it be that jobs are submitted with -soft option as to specify some 
preferece? Or are you using some over-sensitive load thresholds?

Regards,
Andreas



    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list