[GE users] Sort by sequence number question

Erik Lönroth erik.lonroth at scania.com
Wed Jul 18 09:24:53 BST 2007


On ons, 2007-07-18 at 11:52 +0530, Ravi Chandra Nallan wrote:
> Hi Andreas,
> 
> Andreas.Haas at Sun.COM wrote:
> > On Tue, 17 Jul 2007, Paul MacInnis wrote:
> >
> >> The jobs here have no -soft options but they do have load thresholds:
> >>
> >> qname                 ser.q
> >> hostlist              @1g.hg @2g.hg @4g.hg
> >> seq_no                1965,[@2g.hg=2965],[@4g.hg=4965]
> >> load_thresholds       load_avg=1.5,mem_used=500M,[@2g.hg=load_avg=1.5, \
> >>                      mem_used=1.5G],[@4g.hg=load_avg=1.5,mem_used=3.5G]
> >> suspend_thresholds    NONE
> >>
> >> qname                 bg.q
> >> hostlist              @2g.hg @4g.hg
> >> seq_no                2969,[@4g.hg=4969]
> >> load_thresholds       load_avg=1.5,mem_used=1.5G,[@4g.hg=load_avg=1.5, \
> >>                      mem_used=3.5G]
> >> suspend_thresholds    load_avg=2.5
> >>
> >> Each 2G and 4G node has a ser.q and a bg.q queue instance, each with 
> >> same
> >> load_thresholds, but scheduler has a definite preference for the bg.q
> >> instance, inspite of the higher seqno!  Perhaps in time an explanation
> >> will appear ...
> >
> > Could you try whether behaviour changes anyhow when you set 
> > load_thresholds
> > to NONE with both queues? Just temporarily for testing purposes. Load 
> > thresholds make setups always hard to survey, whereas setups without 
> > load thrsholds are fairly deterministic.
> >
> > Andreas
> But does load_threshold play a role in choosing the queue when the 
> queue_sort_method is set seqno?
> And if load_threshold were to make a queue unusable, wouldn't the q be 
> set to alarm state?
> Also I noticed that Erik was able to reproduce the prob with pe jobs. I 
> am not sure if the array jobs had a similar problem.
> I couldn't reproduce it with neither pe not array jobs, must be some 
> setting that is effecting it.
> 
> Ravi
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
Tried without load_thresholds - no effect.

# show the "slave" queue
qconf -sq ashort.101.q | egrep -1 "load_thresholds|seq_no|hostlist|
slots"
qname                 ashort.101.q
hostlist              @ts101_X_hg
seq_no                101,[ts101-1-1.sss.se.scania.com=0], \
                      [ts101-1-0.sss.se.scania.com=101]
load_thresholds       NONE
suspend_thresholds    NONE
slots                 4

# host the master Q
qconf -sq master.101.q | egrep -1 "hostlist|seq|load_thre|slots"
qname                 master.101.q
hostlist              ts101-1-0.sss.se.scania.com
seq_no                1019
load_thresholds       NONE
suspend_thresholds    NONE
slots                 1

.... and after submit:

    274 0.55500 slot-alloc sssler       r     07/18/2007 10:18:01
ashort.101.q at ts101-1-0.sss.se. SLAVE

ashort.101.q at ts101-1-0.sss.se. SLAVE

ashort.101.q at ts101-1-0.sss.se. SLAVE

ashort.101.q at ts101-1-0.sss.se. SLAVE
    274 0.55500 slot-alloc sssler       r     07/18/2007 10:18:01
master.101.q at ts101-1-0.sss.se. MASTER


Can this be due to some more fundamental problem like server comms,
networking, cache or anything else? Time to think different I guess.

/Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list