[GE users] Allocate cores in the same enclosure

Sofia Bassil sofia.bassil at fra.se
Wed Dec 17 15:10:38 GMT 2008


Hi,

No, none of my queues are in any error state.

//Sofia


craffi skrev:
> What is the output of "qstat -f"? Could it be that your queues are all  
> in error state and thus have no free slots to offer?
>
> -Chris
>
>
> On Dec 17, 2008, at 8:48 AM, Sofia Bassil wrote:
>
>   
>> Hi,
>>
>> OK. I think I have followed the instructions exactly, but I get the  
>> error so something must be missing. Here is what I set up:
>>
>> 2 queues, test1 and test2
>> 2 PE:s, test_1 and test_2
>> 2 hostgroups, @testSub1 and @testSub2
>> (sorry about the poor namings)
>>
>> # qconf -sq test1
>> qname test1 #test2
>> hostlist @testSub1 #@testSub2
>> seq_no 0
>> load_thresholds np_load_avg=1.50
>> suspend_thresholds NONE
>> nsuspend 1
>> suspend_interval 00:05:00
>> priority 0
>> min_cpu_interval 00:05:00
>> processors UNDEFINED
>> qtype BATCH INTERATIVE
>> ckpt_list NONE
>> pe_list test_1 #test_2
>> rerun FALSE
>> slots 4
>> tmpdir /tmp
>> shell /bin/bash
>> prolog NONE
>> epilog NONE
>> shell_start_mode unix_behavior
>> starter_method.....terminate_metod=NONE
>> notify 00:00:60
>> owner_list....calendar=NONE
>> initial_state default
>> s_rt....h_vmem=INFINITY
>>
>> The only thing that differs in the queue test2 is qname, hostlist,  
>> and pe_list which are set to test2, @testSub2, and test_2  
>> respectively. There are 4 cores per machine and a total of 8  
>> machines in each hostgroup.
>>
>> # qconf -se test_1
>> pe_name test_1
>> slots 24
>> user_lists NONE
>> xuser_lists NONE
>> start_proc_args /bin/true
>> stop_proc_args /bin/true
>> allocation_rule $round_robin
>> control_slaves FALSE
>> job_is_first_rank FALSE
>> urgency_slots min
>>
>> The only thing that differs in the PE test_2 is pe_name which is set  
>> to test_2.
>>
>> # qconf -shgrp @testSub1
>> group_name @testSub1
>> hostlist host1.my.domain host2.my.domain host3.my.domain  
>> host4.my.domain host5.my.domain \
>>            host6.my.domain host7.my.domain host8.my.domain
>>
>> In testSub2 the group_name and the hostnames differ.
>>
>> When I run a job like this:
>> $ qsub -pe "test*" 1 ./testSub.sh
>> I get the error:
>> cannot run in PE "test_1" because it only offers 0 slots
>> cannot run in PE "test_2" because it only offers 0 slots
>>
>> $ qstat -j <jobid>|grep -v dropped
>> ...
>> sge_o_shell: /bin/bash
>> sge_o_workdir: /home/me
>> sge_o_host: myhost
>> account: sge
>> mail_list: me at myhost.my.domain
>> notify: FALSE
>> job_name: testSub.sh
>> jobshare: 0
>> env_list:
>> script_file: ./testSub.sh
>> parallel environment: test* range: 1
>>                                         cannot run in queue  
>> "anotherq" because PE "test_1" is not in pe list
>>                                         cannot run in PE "test_1"  
>> because it only offers 0 slots
>>                                         cannot run in queue  
>> "anotherq" because PE "test_2" is not in pe list
>>                                         cannot run in PE "test_2"  
>> because it only offers 0 slots
>>
>> $ cat testSub.sh
>> echo hostname=`hostname`
>> sleep 30
>>
>> //Sofia
>>
>>
>> Gerald Ragghianti skrev:
>>     
>>> Hi Sofia,
>>> Yes, I think that the solution described at
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=21159
>>> is correct for your case.  We have a similar situation where we  
>>> need to
>>> group nodes according to which backend interconnect they use.  This
>>> solution does work, but you will need to make sure that the queu
>>> instances for each node are included in the two cluster queues that  
>>> you
>>> created (one queue for each PE that you have).
>>>
>>> - Gerald
>>>
>>> Sofia Bassil wrote:
>>>
>>>       
>>>> Hi Chris and thanks for the reply. I will gladly supply  
>>>> configuration
>>>> information, but my first question is if this solution is even  
>>>> relevant
>>>> for my problem?
>>>>
>>>> //Sofia
>>>>
>>>>
>>>> craffi skrev:
>>>>
>>>>
>>>>         
>>>>> 2 questions --
>>>>>
>>>>>   - What is the output of "qstat -f" ?
>>>>>   - Have you attached the PE's to any queues?
>>>>>
>>>>> -Chris
>>>>>
>>>>>
>>>>> On Dec 15, 2008, at 10:53 AM, Sofia Bassil wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> Hello,
>>>>>>
>>>>>> I am trying to set up a node allocation scheme depending on  
>>>>>> network
>>>>>> layout, following this thread:
>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=21153
>>>>>>
>>>>>> I can't get it to work even though I have the same configuration  
>>>>>> as
>>>>>> far
>>>>>> as I can see. I am not using the same version of Grid Engine  
>>>>>> though, I
>>>>>> am using GE 6.1u4.
>>>>>>
>>>>>> My cluster basically consists of a few blade enclosures and I want
>>>>>> it to
>>>>>> be possible to ask for cores in one enclosure, so that you can  
>>>>>> utilize
>>>>>> the better banwidth within the enclosure. My plan is to set up all
>>>>>> enclosures with one hostgroup, one queue, and one pe each, like  
>>>>>> in the
>>>>>> example.
>>>>>>
>>>>>> When I run a job like this:
>>>>>> $ qsub -pe "test*" 1 ./testSub.sh
>>>>>> I get the error:
>>>>>> cannot run in PE "test_1" because it only offers 0 slots
>>>>>> cannot run in PE "test_2" because it only offers 0 slots
>>>>>>
>>>>>> My first question is, is it still possible to configure this (in  
>>>>>> my
>>>>>> version and in 6.2)?
>>>>>> My second question is, for my problem, is this the right solution?
>>>>>>
>>>>>> Sincerely,
>>>>>> Sofia Bassil

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=92988

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list