[GE users] Sort by sequence number question

Andreas.Haas at Sun.COM Andreas.Haas at Sun.COM
Fri Jul 20 11:31:36 BST 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Erik,

I filed it as

    http://gridengine.sunsource.net/issues/show_bug.cgi?id=2331

from my comment you see resolving that riddle is laborious.

There is a chance someone finds time to work on it, but I can 
not guarantee.

Regards,
Andreas


On Thu, 19 Jul 2007, Lönroth Erik wrote:

> 1. I'm using 6.0u8.
>
> 2. I completely re-installed one of my SGE cells and did only the minimal configuration required to make it possible to run a "sleep 60" job.
>
> 3. I set "sort by seq no"
>
> bash-3.00$  qconf -ssconf | grep seq
> queue_sort_method                 seqno
>
>
> 4. I configure the queues as follows:
>
> bash-3.00$ qstat -F | egrep "seq|@"
> master.101.q at ts101-1-0.sss.se. BIP   0/1       0.00     lx26-amd64
>        qf:seq_no=0
> short.101.q at ts101-1-1.sss.se.s BIP   0/4       0.00     lx26-amd64
>        qf:seq_no=0
> short.101.q at ts101-1-0.sss.se.s BIP   0/4       0.00     lx26-amd64
>        qf:seq_no=99
>
> 5. I have "$fill_up" in my "PE" and as you see here, the queues are configured with the PE.
>
> bash-3.00$ qconf -sq master.101.q | egrep "pe_list"
> pe_list               make powerflow_ts101_pe
> bash-3.00$ qconf -sq short.101.q | egrep "pe_list"
> pe_list               make powerflow_ts101_pe
>
>
> 6. I manipulate the host I wan't to be considered last e.g. ts101-1-0:
>
> bash-3.00$ qconf -sq short.101.q | egrep "hostlist|seq_no"
> hostlist              ts101-1-0.sss.se.scania.com ts101-1-1.sss.se.scania.com
> seq_no                0,[ts101-1-0.sss.se.scania.com=99]
>
> bash-3.00$ qconf -sq master.101.q | egrep "hostlist|seq_no"
> hostlist              ts101-1-0.sss.se.scania.com
> seq_no                0
>
> 7. I submit the sleep-job, asking for 2 slots with that PE. It is dispatched, but not as I would think it would.
>
> bash-3.00$ qstat -t
> 19 0.55500 slot-alloc sssler       r     07/19/2007 16:02:51 master.101.q at ts101-1-0.sss.se. MASTER
> 19 0.55500 slot-alloc sssler       r     07/19/2007 16:02:51 short.101.q at ts101-1-0.sss.se.s SLAVE
>
> Apart from this - default configuration.
>
> /Erik
>
>
> -----Original Message-----
> From: Andreas.Haas at Sun.COM [mailto:Andreas.Haas at Sun.COM]
> Sent: Wed 7/18/2007 4:31 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Sort by sequence number question
>
> Hi Ravi,
>
> On Wed, 18 Jul 2007, Ravi Chandra Nallan wrote:
>
>> Hi Andreas,
>>
>> Andreas.Haas at Sun.COM wrote:
>>> On Tue, 17 Jul 2007, Paul MacInnis wrote:
>>>
>>>> The jobs here have no -soft options but they do have load thresholds:
>>>>
>>>> qname                 ser.q
>>>> hostlist              @1g.hg @2g.hg @4g.hg
>>>> seq_no                1965,[@2g.hg=2965],[@4g.hg=4965]
>>>> load_thresholds       load_avg=1.5,mem_used=500M,[@2g.hg=load_avg=1.5, \
>>>>                      mem_used=1.5G],[@4g.hg=load_avg=1.5,mem_used=3.5G]
>>>> suspend_thresholds    NONE
>>>>
>>>> qname                 bg.q
>>>> hostlist              @2g.hg @4g.hg
>>>> seq_no                2969,[@4g.hg=4969]
>>>> load_thresholds       load_avg=1.5,mem_used=1.5G,[@4g.hg=load_avg=1.5, \
>>>>                      mem_used=3.5G]
>>>> suspend_thresholds    load_avg=2.5
>>>>
>>>> Each 2G and 4G node has a ser.q and a bg.q queue instance, each with same
>>>> load_thresholds, but scheduler has a definite preference for the bg.q
>>>> instance, inspite of the higher seqno!  Perhaps in time an explanation
>>>> will appear ...
>>>
>>> Could you try whether behaviour changes anyhow when you set load_thresholds
>>> to NONE with both queues? Just temporarily for testing purposes. Load
>>> thresholds make setups always hard to survey, whereas setups without load
>>> thrsholds are fairly deterministic.
>>>
>>> Andreas
>> But does load_threshold play a role in choosing the queue when the
>> queue_sort_method is set seqno?
>
> Sure it does.
>
>> And if load_threshold were to make a queue unusable, wouldn't the q be set to
>> alarm state?
>
> Sure.
>
>> Also I noticed that Erik was able to reproduce the prob with pe jobs. I am
>> not sure if the array jobs had a similar problem.
>
> To me these are two different problems: (a) With Eriks parallel job allocation
> problem I already fail to understand why it behaved the way as he described it
> for 5.3 and how he needs it, whereas (b) Pauls sequential job problem looks to me
> like the outcome confusing load thresholds setup.
>
>> I couldn't reproduce it with neither pe not array jobs, must be some setting
>> that is effecting it.
>
> Yep. At least Pauls basic setup worked in my cluster here. For that
> reason I asked Paul to disable load thresholds temporarily.
>
> Regards,
> Andreas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
>

http://gridengine.info/

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering



    [ Part 2, "ATT9730921.txt"  Text/PLAIN (Name: "ATT9730921.txt") ~213 ]
    [ bytes. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list