[GE users] Allocate cores in the same enclosure

Sofia Bassil sofia.bassil at fra.se
Wed Dec 17 13:48:00 GMT 2008


Hi,

OK. I think I have followed the instructions exactly, but I get the 
error so something must be missing. Here is what I set up:

2 queues, test1 and test2
2 PE:s, test_1 and test_2
2 hostgroups, @testSub1 and @testSub2
(sorry about the poor namings)

# qconf -sq test1
qname test1 #test2
hostlist @testSub1 #@testSub2
seq_no 0
load_thresholds np_load_avg=1.50
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH INTERATIVE
ckpt_list NONE
pe_list test_1 #test_2
rerun FALSE
slots 4
tmpdir /tmp
shell /bin/bash
prolog NONE
epilog NONE
shell_start_mode unix_behavior
starter_method.....terminate_metod=NONE
notify 00:00:60
owner_list....calendar=NONE
initial_state default
s_rt....h_vmem=INFINITY

The only thing that differs in the queue test2 is qname, hostlist, and 
pe_list which are set to test2, @testSub2, and test_2 respectively. 
There are 4 cores per machine and a total of 8 machines in each hostgroup.

# qconf -se test_1
pe_name test_1
slots 24
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $round_robin
control_slaves FALSE
job_is_first_rank FALSE
urgency_slots min

The only thing that differs in the PE test_2 is pe_name which is set to 
test_2.

# qconf -shgrp @testSub1
group_name @testSub1
hostlist host1.my.domain host2.my.domain host3.my.domain host4.my.domain 
host5.my.domain \
           host6.my.domain host7.my.domain host8.my.domain

In testSub2 the group_name and the hostnames differ.

When I run a job like this:
$ qsub -pe "test*" 1 ./testSub.sh
I get the error:
cannot run in PE "test_1" because it only offers 0 slots
cannot run in PE "test_2" because it only offers 0 slots


$ qstat -j <jobid>|grep -v dropped
...
sge_o_shell: /bin/bash
sge_o_workdir: /home/me
sge_o_host: myhost
account: sge
mail_list: me at myhost.my.domain
notify: FALSE
job_name: testSub.sh
jobshare: 0
env_list:
script_file: ./testSub.sh
parallel environment: test* range: 1
                                        cannot run in queue "anotherq" 
because PE "test_1" is not in pe list
                                        cannot run in PE "test_1" 
because it only offers 0 slots
                                        cannot run in queue "anotherq" 
because PE "test_2" is not in pe list
                                        cannot run in PE "test_2" 
because it only offers 0 slots

$ cat testSub.sh
echo hostname=`hostname`
sleep 30

//Sofia


Gerald Ragghianti skrev:
> Hi Sofia,
> Yes, I think that the solution described at 
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=21159 
> is correct for your case.  We have a similar situation where we need to 
> group nodes according to which backend interconnect they use.  This 
> solution does work, but you will need to make sure that the queu 
> instances for each node are included in the two cluster queues that you 
> created (one queue for each PE that you have).
>
> - Gerald
>
> Sofia Bassil wrote:
>   
>> Hi Chris and thanks for the reply. I will gladly supply configuration 
>> information, but my first question is if this solution is even relevant 
>> for my problem?
>>
>> //Sofia
>>
>>
>> craffi skrev:
>>   
>>     
>>> 2 questions --
>>>
>>>   - What is the output of "qstat -f" ?
>>>   - Have you attached the PE's to any queues?
>>>
>>> -Chris
>>>
>>>
>>> On Dec 15, 2008, at 10:53 AM, Sofia Bassil wrote:
>>>
>>>   
>>>     
>>>       
>>>> Hello,
>>>>
>>>> I am trying to set up a node allocation scheme depending on network
>>>> layout, following this thread:
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=21153
>>>>
>>>> I can't get it to work even though I have the same configuration as  
>>>> far
>>>> as I can see. I am not using the same version of Grid Engine though, I
>>>> am using GE 6.1u4.
>>>>
>>>> My cluster basically consists of a few blade enclosures and I want  
>>>> it to
>>>> be possible to ask for cores in one enclosure, so that you can utilize
>>>> the better banwidth within the enclosure. My plan is to set up all
>>>> enclosures with one hostgroup, one queue, and one pe each, like in the
>>>> example.
>>>>
>>>> When I run a job like this:
>>>> $ qsub -pe "test*" 1 ./testSub.sh
>>>> I get the error:
>>>> cannot run in PE "test_1" because it only offers 0 slots
>>>> cannot run in PE "test_2" because it only offers 0 slots
>>>>
>>>> My first question is, is it still possible to configure this (in my
>>>> version and in 6.2)?
>>>> My second question is, for my problem, is this the right solution?
>>>>
>>>> Sincerely,
>>>> Sofia Bassil
>>>>       
>>>>         
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=92775
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>   
>>     
>
>
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=92970

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list