[GE users] Sort by sequence number question

Lönroth Erik erik.lonroth at scania.com
Wed Jul 11 12:07:31 BST 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

yes, it is set. Still no luck on this.

The only way I can force the damn slaves off the MASTER node, is to remove the requested "PE" explicitly from the nodes. This is not what I would want, but I just can't make it happen. It simply ignores my sequence number alltogether. I have recreated all queues and restarted the qmaster and scheduler, but no luck whatsoever.

Is there something else affecting the effect of "sequence number" outside of the general queue configuration and the cluster config?

/Erik
  


-----Original Message-----
From: Daniel Templeton [mailto:Dan.Templeton at Sun.COM]
Sent: Tue 7/10/2007 4:57 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Sort by sequence number question
 
Erik,

You did set the queue_sort_method to "seqno", right?

Daniel

Lönroth Erik wrote:
> Yes this is what I wan't... And I have done exactly like this, but the scheduler seems to ignore the sequence number regardless of how I set the sequence number. ts101-1-0 will always get the SLAVES.
>
> This is how it looks:
>
> bash-3.00$ cat spool/qmaster/cqueues/short.101.q 
> qname              short.101.q
> hostlist           @ts101_X_hg
> seq_no             0,[ts101-1-0.sss.se.scania.com=1]
> load_thresholds    np_load_avg=1.75
> suspend_thresholds NONE
> nsuspend           1
> suspend_interval   00:05:00
> priority           0
> min_cpu_interval   00:05:00
> processors         UNDEFINED
> qtype              BATCH INTERACTIVE
> ckpt_list          NONE
> pe_list            make powerflow_ts101_pe
> rerun              FALSE
> slots              4
> tmpdir             /tmp
> shell              /bin/csh
> prolog             NONE
> epilog             NONE
> shell_start_mode   posix_compliant
> starter_method     NONE
> suspend_method     NONE
> resume_method      NONE
> terminate_method   NONE
> notify             00:00:60
>
> ... And despite this I get:
>
>     202 0.55500 nano       sssler       r     07/10/2007 13:35:49 master.101.q at ts101-1-0.sss.se. MASTER        
>     202 0.55500 nano       sssler       r     07/10/2007 13:35:49 short.101.q at ts101-1-0.sss.se.s SLAVE         
>                                                                   short.101.q at ts101-1-0.sss.se.s SLAVE         
>                                                                   short.101.q at ts101-1-0.sss.se.s SLAVE         
>                                                                   short.101.q at ts101-1-0.sss.se.s SLAVE
>
>
> The only way for me to get jobs onto ts101-1-1 is to set "slots=0" for ts101-1-0 on in the short.101.q, which is not what I want, since I want to be able to run jobs on that node whenever a master-slot is not used.
>
> Something is very wrong.
>
> /Erik
>
>
> -----Original Message-----
> From: Ravichandra.Nallan at Sun.COM [mailto:Ravichandra.Nallan at Sun.COM] 
> Sent: den 10 juli 2007 13:03
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Sort by sequence number question
>
>
>
>  From the info provided it looks like you have 2 queues, master.101.q 
> and short.101.q
> And short.101.q has 2 hosts(101-1-0, 101-1-1), and you want jobs to 
> start on one host before other, right?
>
> Did you set the seq_no to short.101.q? can you qconf -sq short.101.q | 
> grep seq ?
> If I am right, by setting the seq_no to you choose one queue over other. 
> But in you case you need to set the seq_no per host as you want 
> short.101.q at 101-1-1 to be allotted first before short.101.q at 101-1-0. i.e
> seq_no   1,[ts101-1-1.sss.se.s=2],[ts101-1-0.sss.se.s=3]
> should do the trick.
> let me know if it helps,
> regards,
> Ravi
>
> Lönroth Erik wrote:
>   
>> Regardless how I try - this is always the outcome.
>>
>>     187 0.55500 nano       sssler       r     07/10/2007 12:40:10 master.101.q at ts101-1-0.sss.se. MASTER        
>>     187 0.55500 nano       sssler       r     07/10/2007 12:40:10 short.101.q at ts101-1-0.sss.se.s SLAVE         
>>                                                                   short.101.q at ts101-1-0.sss.se.s SLAVE         
>>                                                                   short.101.q at ts101-1-0.sss.se.s SLAVE         
>>                                                                   
>> short.101.q at ts101-1-0.sss.se.s SLAVE
>>
>> The MASTER and SLAVES turns up on the same node.
>>
>>
>>
>> /Erik
>>
>>
>> -----Original Message-----
>> From: Lönroth Erik [mailto:erik.lonroth at scania.com]
>> Sent: den 10 juli 2007 12:32
>> To: users at gridengine.sunsource.net
>> Subject: RE: [GE users] Sort by sequence number question
>>
>>
>> I'm sure it worked before, but somehow - the scheduler now keeps 
>> assigning jobs and to my desperation I'm starting to think I'm crazy. 
>> It seems to ignore my "sequence numbers" entirely at the moment.
>>
>> I'll try fiddle with the "round robin" thing, but fill_up is what I 
>> really want.
>>
>> /Erik
>>
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: den 10 juli 2007 12:18
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Sort by sequence number question
>>
>>
>> Hi,
>>
>> I remember this discussion:
>>
>> http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=2002
>> 2
>>
>> It didn't solve your setup problem?
>>
>> -- Reuti
>>
>> PS: You could try to use $round_robin instead of $fill_up.
>>
>>
>> Am 10.07.2007 um 11:24 schrieb Lönroth Erik:
>>
>>   
>>     
>>> Hello!
>>>
>>> I have setup "sort by sequence number" for my cell, in "cluster
>>> configuration" - this is because I want a few specific nodes in my
>>> cluster to be "considered in last hand" when assigning jobs.
>>>
>>> Lets say I have 10 nodes, where the first 2 nodes are to be 
>>> considered
>>> last. I have assigned them a sequence number "99" (just
>>> a high value) specifically in the cluster queue "short.q".  
>>> Regardless of how I set this sequence number - the nodes I wan't to  
>>> be considered last still gets included.
>>>
>>> The nodes I want to be "considered last" are "MASTER" nodes, thats 
>>> why
>>> I don't want any additional jobs running on them - unless there
>>> is absolutely nessesary.
>>>
>>> This is the queue situation where ts101-1-0 has a higher sequence
>>> number then ts101-1-1 (considered last?):
>>>
>>> master.101.q at ts101-1-0.sss.se. BIPC  0/1       0.00     lx26-amd64
>>> ---------------------------------------------------------------------
>>> -
>>> ------
>>> short.101.q at ts101-1-1.sss.se.s BIPC  0/4       0.00     lx26-amd64
>>> ----------------------------------------------------------------------
>>> ------
>>> short.101.q at ts101-1-0.sss.se.s BIPC  0/4       0.00     lx26-amd64
>>>
>>>
>>> My PE is configured as:
>>> pe_name           generic_pe
>>> slots             9999
>>> user_lists        NONE
>>> xuser_lists       NONE
>>> start_proc_args   /opt/gridengine/apps/start_generic_pe.sh  
>>> $pe_hostfile
>>> stop_proc_args    /opt/gridengine/apps/stop_generic_pe.sh
>>> allocation_rule   $fill_up
>>> control_slaves    FALSE
>>> job_is_first_task TRUE
>>> urgency_slots     min
>>>
>>> ----- At submit time -----
>>> When I submit a job (asking for 1 MASTER + 4 SLAVES) and no other
>>> specific requirements:
>>>
>>>     qsub -masterq master.*.q -pe generic_pe 5 basic-4-slots.sh
>>>
>>> Now - I would expect ts101-1-0  - NOT to have any SLAVES allocated to
>>> it. BUT!
>>>
>>> ---- The allocation map -----
>>> qstat -t
>>>
>>>     176 0.55500 nano       sssler       r     07/10/2007 11:16:38  
>>> master.101.q at ts101-1-0.sss.se. MASTER
>>>     176 0.55500 nano       sssler       r     07/10/2007 11:16:38  
>>> short.101.q at ts101-1-0.sss.se.s SLAVE
>>>                                                                    
>>> short.101.q at ts101-1-0.sss.se.s SLAVE
>>>                                                                    
>>> short.101.q at ts101-1-0.sss.se.s SLAVE
>>>                                                                    
>>> short.101.q at ts101-1-0.sss.se.s SLAVE
>>>
>>>
>>> What am I doing wrong here? I want the situation to look like this
>>> (but it doesn't)
>>>
>>>
>>>     176 0.55500 nano       sssler       r     07/10/2007 11:16:38  
>>> master.101.q at ts101-1-0.sss.se. MASTER
>>>     176 0.55500 nano       sssler       r     07/10/2007 11:16:38  
>>> short.101.q at ts101-1-1.sss.se.s SLAVE
>>>                                                                    
>>> short.101.q at ts101-1-1.sss.se.s SLAVE
>>>                                                                    
>>> short.101.q at ts101-1-1.sss.se.s SLAVE
>>>                                                                    
>>> short.101.q at ts101-1-1.sss.se.s SLAVE
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>     
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>   
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net





    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list