[GE users] Sort by sequence number question

Daniel Templeton Dan.Templeton at Sun.COM
Tue Jul 10 15:57:18 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Erik,

You did set the queue_sort_method to "seqno", right?

Daniel

Lönroth Erik wrote:
> Yes this is what I wan't... And I have done exactly like this, but the scheduler seems to ignore the sequence number regardless of how I set the sequence number. ts101-1-0 will always get the SLAVES.
>
> This is how it looks:
>
> bash-3.00$ cat spool/qmaster/cqueues/short.101.q 
> qname              short.101.q
> hostlist           @ts101_X_hg
> seq_no             0,[ts101-1-0.sss.se.scania.com=1]
> load_thresholds    np_load_avg=1.75
> suspend_thresholds NONE
> nsuspend           1
> suspend_interval   00:05:00
> priority           0
> min_cpu_interval   00:05:00
> processors         UNDEFINED
> qtype              BATCH INTERACTIVE
> ckpt_list          NONE
> pe_list            make powerflow_ts101_pe
> rerun              FALSE
> slots              4
> tmpdir             /tmp
> shell              /bin/csh
> prolog             NONE
> epilog             NONE
> shell_start_mode   posix_compliant
> starter_method     NONE
> suspend_method     NONE
> resume_method      NONE
> terminate_method   NONE
> notify             00:00:60
>
> ... And despite this I get:
>
>     202 0.55500 nano       sssler       r     07/10/2007 13:35:49 master.101.q at ts101-1-0.sss.se. MASTER        
>     202 0.55500 nano       sssler       r     07/10/2007 13:35:49 short.101.q at ts101-1-0.sss.se.s SLAVE         
>                                                                   short.101.q at ts101-1-0.sss.se.s SLAVE         
>                                                                   short.101.q at ts101-1-0.sss.se.s SLAVE         
>                                                                   short.101.q at ts101-1-0.sss.se.s SLAVE
>
>
> The only way for me to get jobs onto ts101-1-1 is to set "slots=0" for ts101-1-0 on in the short.101.q, which is not what I want, since I want to be able to run jobs on that node whenever a master-slot is not used.
>
> Something is very wrong.
>
> /Erik
>
>
> -----Original Message-----
> From: Ravichandra.Nallan at Sun.COM [mailto:Ravichandra.Nallan at Sun.COM] 
> Sent: den 10 juli 2007 13:03
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Sort by sequence number question
>
>
>
>  From the info provided it looks like you have 2 queues, master.101.q 
> and short.101.q
> And short.101.q has 2 hosts(101-1-0, 101-1-1), and you want jobs to 
> start on one host before other, right?
>
> Did you set the seq_no to short.101.q? can you qconf -sq short.101.q | 
> grep seq ?
> If I am right, by setting the seq_no to you choose one queue over other. 
> But in you case you need to set the seq_no per host as you want 
> short.101.q at 101-1-1 to be allotted first before short.101.q at 101-1-0. i.e
> seq_no   1,[ts101-1-1.sss.se.s=2],[ts101-1-0.sss.se.s=3]
> should do the trick.
> let me know if it helps,
> regards,
> Ravi
>
> Lönroth Erik wrote:
>   
>> Regardless how I try - this is always the outcome.
>>
>>     187 0.55500 nano       sssler       r     07/10/2007 12:40:10 master.101.q at ts101-1-0.sss.se. MASTER        
>>     187 0.55500 nano       sssler       r     07/10/2007 12:40:10 short.101.q at ts101-1-0.sss.se.s SLAVE         
>>                                                                   short.101.q at ts101-1-0.sss.se.s SLAVE         
>>                                                                   short.101.q at ts101-1-0.sss.se.s SLAVE         
>>                                                                   
>> short.101.q at ts101-1-0.sss.se.s SLAVE
>>
>> The MASTER and SLAVES turns up on the same node.
>>
>>
>>
>> /Erik
>>
>>
>> -----Original Message-----
>> From: Lönroth Erik [mailto:erik.lonroth at scania.com]
>> Sent: den 10 juli 2007 12:32
>> To: users at gridengine.sunsource.net
>> Subject: RE: [GE users] Sort by sequence number question
>>
>>
>> I'm sure it worked before, but somehow - the scheduler now keeps 
>> assigning jobs and to my desperation I'm starting to think I'm crazy. 
>> It seems to ignore my "sequence numbers" entirely at the moment.
>>
>> I'll try fiddle with the "round robin" thing, but fill_up is what I 
>> really want.
>>
>> /Erik
>>
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: den 10 juli 2007 12:18
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Sort by sequence number question
>>
>>
>> Hi,
>>
>> I remember this discussion:
>>
>> http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=2002
>> 2
>>
>> It didn't solve your setup problem?
>>
>> -- Reuti
>>
>> PS: You could try to use $round_robin instead of $fill_up.
>>
>>
>> Am 10.07.2007 um 11:24 schrieb Lönroth Erik:
>>
>>   
>>     
>>> Hello!
>>>
>>> I have setup "sort by sequence number" for my cell, in "cluster
>>> configuration" - this is because I want a few specific nodes in my
>>> cluster to be "considered in last hand" when assigning jobs.
>>>
>>> Lets say I have 10 nodes, where the first 2 nodes are to be 
>>> considered
>>> last. I have assigned them a sequence number "99" (just
>>> a high value) specifically in the cluster queue "short.q".  
>>> Regardless of how I set this sequence number - the nodes I wan't to  
>>> be considered last still gets included.
>>>
>>> The nodes I want to be "considered last" are "MASTER" nodes, thats 
>>> why
>>> I don't want any additional jobs running on them - unless there
>>> is absolutely nessesary.
>>>
>>> This is the queue situation where ts101-1-0 has a higher sequence
>>> number then ts101-1-1 (considered last?):
>>>
>>> master.101.q at ts101-1-0.sss.se. BIPC  0/1       0.00     lx26-amd64
>>> ---------------------------------------------------------------------
>>> -
>>> ------
>>> short.101.q at ts101-1-1.sss.se.s BIPC  0/4       0.00     lx26-amd64
>>> ----------------------------------------------------------------------
>>> ------
>>> short.101.q at ts101-1-0.sss.se.s BIPC  0/4       0.00     lx26-amd64
>>>
>>>
>>> My PE is configured as:
>>> pe_name           generic_pe
>>> slots             9999
>>> user_lists        NONE
>>> xuser_lists       NONE
>>> start_proc_args   /opt/gridengine/apps/start_generic_pe.sh  
>>> $pe_hostfile
>>> stop_proc_args    /opt/gridengine/apps/stop_generic_pe.sh
>>> allocation_rule   $fill_up
>>> control_slaves    FALSE
>>> job_is_first_task TRUE
>>> urgency_slots     min
>>>
>>> ----- At submit time -----
>>> When I submit a job (asking for 1 MASTER + 4 SLAVES) and no other
>>> specific requirements:
>>>
>>>     qsub -masterq master.*.q -pe generic_pe 5 basic-4-slots.sh
>>>
>>> Now - I would expect ts101-1-0  - NOT to have any SLAVES allocated to
>>> it. BUT!
>>>
>>> ---- The allocation map -----
>>> qstat -t
>>>
>>>     176 0.55500 nano       sssler       r     07/10/2007 11:16:38  
>>> master.101.q at ts101-1-0.sss.se. MASTER
>>>     176 0.55500 nano       sssler       r     07/10/2007 11:16:38  
>>> short.101.q at ts101-1-0.sss.se.s SLAVE
>>>                                                                    
>>> short.101.q at ts101-1-0.sss.se.s SLAVE
>>>                                                                    
>>> short.101.q at ts101-1-0.sss.se.s SLAVE
>>>                                                                    
>>> short.101.q at ts101-1-0.sss.se.s SLAVE
>>>
>>>
>>> What am I doing wrong here? I want the situation to look like this
>>> (but it doesn't)
>>>
>>>
>>>     176 0.55500 nano       sssler       r     07/10/2007 11:16:38  
>>> master.101.q at ts101-1-0.sss.se. MASTER
>>>     176 0.55500 nano       sssler       r     07/10/2007 11:16:38  
>>> short.101.q at ts101-1-1.sss.se.s SLAVE
>>>                                                                    
>>> short.101.q at ts101-1-1.sss.se.s SLAVE
>>>                                                                    
>>> short.101.q at ts101-1-1.sss.se.s SLAVE
>>>                                                                    
>>> short.101.q at ts101-1-1.sss.se.s SLAVE
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>     
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>   
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list