[GE users] Allocate cores in the same enclosure

Sofia Bassil sofia.bassil at fra.se
Wed Dec 17 15:45:08 GMT 2008


Hi Reuti,

Using one queue the way you specified makes my job run, and it allocated 
cores correctly. Thank you! Plus its a slimmer solution whose 
configuration is easier to read. Very nice!
The fact that it works with one queue but not two, is that a difference 
in the versions of Grid Engine? Is it supposed to work with 2 queues as 
well in 6.1?
The 24 was a result of sloppy copying.

A follow up question:
Is there a way to allocate jobs first in one hostgroup, and then in 
another. Say I have 3 blade enlosures with 8 machines with 4 CPU:s in 
each. Enclosure 1 and 2 are right next to each other while enclosure 3 
is in another location. I want my job to run on 40 cores with as high 
bandwidth as possible. Can I somehow, using this same solution you have 
helped me with, get the scheduler to allocate all the 32 cores in 
enclosure 1 and then allocate the remaining 8 requested cores in 
enclosure 2? Or is there some other way of doing this?

//Sofia


reuti skrev:
> Hi,
>
> Am 17.12.2008 um 14:48 schrieb Sofia Bassil:
>
>   
>> Hi,
>>
>> OK. I think I have followed the instructions exactly, but I get the  
>> error so something must be missing. Here is what I set up:
>>
>> 2 queues, test1 and test2
>>     
>
> small side note. If you prefer, you could also stay with one queue  
> nowadays:
>
>
>   
>> 2 PE:s, test_1 and test_2
>> 2 hostgroups, @testSub1 and @testSub2
>> (sorry about the poor namings)
>>
>> # qconf -sq test1
>> qname test1 #test2
>> hostlist @testSub1 #@testSub2
>>     
>
> hostlist @testSub1 @testSub2
>
>
>   
>> seq_no 0
>> load_thresholds np_load_avg=1.50
>> suspend_thresholds NONE
>> nsuspend 1
>> suspend_interval 00:05:00
>> priority 0
>> min_cpu_interval 00:05:00
>> processors UNDEFINED
>> qtype BATCH INTERATIVE
>> ckpt_list NONE
>> pe_list test_1 #test_2
>>     
>
> pe_list NONE,[@testSub1=test_1],[@testSub2=test_2]
>
>
>   
>> rerun FALSE
>> slots 4
>> tmpdir /tmp
>> shell /bin/bash
>> prolog NONE
>> epilog NONE
>> shell_start_mode unix_behavior
>> starter_method.....terminate_metod=NONE
>> notify 00:00:60
>> owner_list....calendar=NONE
>> initial_state default
>> s_rt....h_vmem=INFINITY
>>
>> The only thing that differs in the queue test2 is qname, hostlist,  
>> and pe_list which are set to test2, @testSub2, and test_2  
>> respectively. There are 4 cores per machine and a total of 8  
>> machines in each hostgroup.
>>
>> # qconf -se test_1
>> pe_name test_1
>> slots 24
>>     
>
> Is this a type or by intention to limit the usage, you stated 4x8 above:
>
> slots 32
>
> -- Reuti
>
>
>   
>> user_lists NONE
>> xuser_lists NONE
>> start_proc_args /bin/true
>> stop_proc_args /bin/true
>> allocation_rule $round_robin
>> control_slaves FALSE
>> job_is_first_rank FALSE
>> urgency_slots min
>>
>> The only thing that differs in the PE test_2 is pe_name which is  
>> set to test_2.
>>
>> # qconf -shgrp @testSub1
>> group_name @testSub1
>> hostlist host1.my.domain host2.my.domain host3.my.domain  
>> host4.my.domain host5.my.domain \
>>            host6.my.domain host7.my.domain host8.my.domain
>>
>> In testSub2 the group_name and the hostnames differ.
>>
>> When I run a job like this: $ qsub -pe "test*" 1 ./testSub.sh I get  
>> the error: cannot run in PE "test_1" because it only offers 0 slots  
>> cannot run in PE "test_2" because it only offers 0 slots $ qstat -j  
>> <jobid>|grep -v dropped
>> ...
>> sge_o_shell: /bin/bash
>> sge_o_workdir: /home/me
>> sge_o_host: myhost
>> account: sge
>> mail_list: me at myhost.my.domain
>> notify: FALSE
>> job_name: testSub.sh
>> jobshare: 0
>> env_list:
>> script_file: ./testSub.sh
>> parallel environment: test* range: 1
>>                                         cannot run in queue  
>> "anotherq" because PE "test_1" is not in pe list
>>                                         cannot run in PE "test_1"  
>> because it only offers 0 slots
>>                                         cannot run in queue  
>> "anotherq" because PE "test_2" is not in pe list
>>                                         cannot run in PE "test_2"  
>> because it only offers 0 slots
>>
>> $ cat testSub.sh
>> echo hostname=`hostname`
>> sleep 30
>>
>> //Sofia
>>
>>
>> Gerald Ragghianti skrev:
>>     
>>> Hi Sofia, Yes, I think that the solution described at http:// 
>>> gridengine.sunsource.net/ds/viewMessage.do? 
>>> dsForumId=38&dsMessageId=21159 is correct for your case. We have a  
>>> similar situation where we need to group nodes according to which  
>>> backend interconnect they use. This solution does work, but you  
>>> will need to make sure that the queu instances for each node are  
>>> included in the two cluster queues that you created (one queue for  
>>> each PE that you have). - Gerald Sofia Bassil wrote:
>>>       
>>>> Hi Chris and thanks for the reply. I will gladly supply  
>>>> configuration information, but my first question is if this  
>>>> solution is even relevant for my problem? //Sofia craffi skrev:
>>>>         
>>>>> 2 questions -- - What is the output of "qstat -f" ? - Have you  
>>>>> attached the PE's to any queues? -Chris On Dec 15, 2008, at  
>>>>> 10:53 AM, Sofia Bassil wrote:
>>>>>           
>>>>>> Hello, I am trying to set up a node allocation scheme depending  
>>>>>> on network layout, following this thread: http:// 
>>>>>> gridengine.sunsource.net/ds/viewMessage.do? 
>>>>>> dsForumId=38&dsMessageId=21153 I can't get it to work even  
>>>>>> though I have the same configuration as far as I can see. I am  
>>>>>> not using the same version of Grid Engine though, I am using GE  
>>>>>> 6.1u4. My cluster basically consists of a few blade enclosures  
>>>>>> and I want it to be possible to ask for cores in one enclosure,  
>>>>>> so that you can utilize the better banwidth within the  
>>>>>> enclosure. My plan is to set up all enclosures with one  
>>>>>> hostgroup, one queue, and one pe each, like in the example.  
>>>>>> When I run a job like this: $ qsub -pe "test*" 1 ./testSub.sh I  
>>>>>> get the error: cannot run in PE "test_1" because it only offers  
>>>>>> 0 slots cannot run in PE "test_2" because it only offers 0  
>>>>>> slots My first question is, is it still possible to configure  
>>>>>> this (in my version and in 6.2)? My second question is, for my  
>>>>>> problem, is this the right solution? Sincerely, Sofia Bassil
>>>>>>             
>>>> ------------------------------------------------------ http:// 
>>>> gridengine.sunsource.net/ds/viewMessage.do? 
>>>> dsForumId=38&dsMessageId=92775 To unsubscribe from this  
>>>> discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>         
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=92980
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=92994

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list