[GE users] Allocate cores in the same enclosure

reuti reuti at staff.uni-marburg.de
Wed Dec 17 14:25:40 GMT 2008


Hi,

Am 17.12.2008 um 14:48 schrieb Sofia Bassil:

> Hi,
>
> OK. I think I have followed the instructions exactly, but I get the  
> error so something must be missing. Here is what I set up:
>
> 2 queues, test1 and test2

small side note. If you prefer, you could also stay with one queue  
nowadays:


> 2 PE:s, test_1 and test_2
> 2 hostgroups, @testSub1 and @testSub2
> (sorry about the poor namings)
>
> # qconf -sq test1
> qname test1 #test2
> hostlist @testSub1 #@testSub2

hostlist @testSub1 @testSub2


> seq_no 0
> load_thresholds np_load_avg=1.50
> suspend_thresholds NONE
> nsuspend 1
> suspend_interval 00:05:00
> priority 0
> min_cpu_interval 00:05:00
> processors UNDEFINED
> qtype BATCH INTERATIVE
> ckpt_list NONE
> pe_list test_1 #test_2

pe_list NONE,[@testSub1=test_1],[@testSub2=test_2]


> rerun FALSE
> slots 4
> tmpdir /tmp
> shell /bin/bash
> prolog NONE
> epilog NONE
> shell_start_mode unix_behavior
> starter_method.....terminate_metod=NONE
> notify 00:00:60
> owner_list....calendar=NONE
> initial_state default
> s_rt....h_vmem=INFINITY
>
> The only thing that differs in the queue test2 is qname, hostlist,  
> and pe_list which are set to test2, @testSub2, and test_2  
> respectively. There are 4 cores per machine and a total of 8  
> machines in each hostgroup.
>
> # qconf -se test_1
> pe_name test_1
> slots 24

Is this a type or by intention to limit the usage, you stated 4x8 above:

slots 32

-- Reuti


> user_lists NONE
> xuser_lists NONE
> start_proc_args /bin/true
> stop_proc_args /bin/true
> allocation_rule $round_robin
> control_slaves FALSE
> job_is_first_rank FALSE
> urgency_slots min
>
> The only thing that differs in the PE test_2 is pe_name which is  
> set to test_2.
>
> # qconf -shgrp @testSub1
> group_name @testSub1
> hostlist host1.my.domain host2.my.domain host3.my.domain  
> host4.my.domain host5.my.domain \
>            host6.my.domain host7.my.domain host8.my.domain
>
> In testSub2 the group_name and the hostnames differ.
>
> When I run a job like this: $ qsub -pe "test*" 1 ./testSub.sh I get  
> the error: cannot run in PE "test_1" because it only offers 0 slots  
> cannot run in PE "test_2" because it only offers 0 slots $ qstat -j  
> <jobid>|grep -v dropped
> ...
> sge_o_shell: /bin/bash
> sge_o_workdir: /home/me
> sge_o_host: myhost
> account: sge
> mail_list: me at myhost.my.domain
> notify: FALSE
> job_name: testSub.sh
> jobshare: 0
> env_list:
> script_file: ./testSub.sh
> parallel environment: test* range: 1
>                                         cannot run in queue  
> "anotherq" because PE "test_1" is not in pe list
>                                         cannot run in PE "test_1"  
> because it only offers 0 slots
>                                         cannot run in queue  
> "anotherq" because PE "test_2" is not in pe list
>                                         cannot run in PE "test_2"  
> because it only offers 0 slots
>
> $ cat testSub.sh
> echo hostname=`hostname`
> sleep 30
>
> //Sofia
>
>
> Gerald Ragghianti skrev:
>>
>> Hi Sofia, Yes, I think that the solution described at http:// 
>> gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=21159 is correct for your case. We have a  
>> similar situation where we need to group nodes according to which  
>> backend interconnect they use. This solution does work, but you  
>> will need to make sure that the queu instances for each node are  
>> included in the two cluster queues that you created (one queue for  
>> each PE that you have). - Gerald Sofia Bassil wrote:
>>>
>>> Hi Chris and thanks for the reply. I will gladly supply  
>>> configuration information, but my first question is if this  
>>> solution is even relevant for my problem? //Sofia craffi skrev:
>>>>
>>>> 2 questions -- - What is the output of "qstat -f" ? - Have you  
>>>> attached the PE's to any queues? -Chris On Dec 15, 2008, at  
>>>> 10:53 AM, Sofia Bassil wrote:
>>>>>
>>>>> Hello, I am trying to set up a node allocation scheme depending  
>>>>> on network layout, following this thread: http:// 
>>>>> gridengine.sunsource.net/ds/viewMessage.do? 
>>>>> dsForumId=38&dsMessageId=21153 I can't get it to work even  
>>>>> though I have the same configuration as far as I can see. I am  
>>>>> not using the same version of Grid Engine though, I am using GE  
>>>>> 6.1u4. My cluster basically consists of a few blade enclosures  
>>>>> and I want it to be possible to ask for cores in one enclosure,  
>>>>> so that you can utilize the better banwidth within the  
>>>>> enclosure. My plan is to set up all enclosures with one  
>>>>> hostgroup, one queue, and one pe each, like in the example.  
>>>>> When I run a job like this: $ qsub -pe "test*" 1 ./testSub.sh I  
>>>>> get the error: cannot run in PE "test_1" because it only offers  
>>>>> 0 slots cannot run in PE "test_2" because it only offers 0  
>>>>> slots My first question is, is it still possible to configure  
>>>>> this (in my version and in 6.2)? My second question is, for my  
>>>>> problem, is this the right solution? Sincerely, Sofia Bassil
>>> ------------------------------------------------------ http:// 
>>> gridengine.sunsource.net/ds/viewMessage.do? 
>>> dsForumId=38&dsMessageId=92775 To unsubscribe from this  
>>> discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=92980

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list