[GE users] Allocate cores in the same enclosure

reuti reuti at staff.uni-marburg.de
Wed Dec 17 23:05:08 GMT 2008


Am 17.12.2008 um 23:50 schrieb Gerald Ragghianti:

> Where can I find documentation on this abbreviated queue definition
> format?  Is it supported in 6.1u5? Can it work with more than two
> hostgroups?

Nowwhere, it's a comment from the author of the post to shorten it.

-- Reuti


> qname test1 #test2
> hostlist @testSub1 #@testSub2
>
>
> - Gerald
>
> Sofia Bassil wrote:
>> Hi Reuti,
>>
>> Using one queue the way you specified makes my job run, and it
>> allocated cores correctly. Thank you! Plus its a slimmer solution
>> whose configuration is easier to read. Very nice!
>> The fact that it works with one queue but not two, is that a
>> difference in the versions of Grid Engine? Is it supposed to work  
>> with
>> 2 queues as well in 6.1?
>> The 24 was a result of sloppy copying.
>>
>> A follow up question:
>> Is there a way to allocate jobs first in one hostgroup, and then in
>> another. Say I have 3 blade enlosures with 8 machines with 4 CPU:s in
>> each. Enclosure 1 and 2 are right next to each other while  
>> enclosure 3
>> is in another location. I want my job to run on 40 cores with as high
>> bandwidth as possible. Can I somehow, using this same solution you
>> have helped me with, get the scheduler to allocate all the 32  
>> cores in
>> enclosure 1 and then allocate the remaining 8 requested cores in
>> enclosure 2? Or is there some other way of doing this?
>>
>> //Sofia
>>
>>
>> reuti skrev:
>>> Hi,
>>>
>>> Am 17.12.2008 um 14:48 schrieb Sofia Bassil:
>>>
>>>
>>>> Hi,
>>>>
>>>> OK. I think I have followed the instructions exactly, but I get the
>>>> error so something must be missing. Here is what I set up:
>>>>
>>>> 2 queues, test1 and test2
>>>>
>>>
>>> small side note. If you prefer, you could also stay with one queue
>>> nowadays:
>>>
>>>
>>>
>>>> 2 PE:s, test_1 and test_2
>>>> 2 hostgroups, @testSub1 and @testSub2
>>>> (sorry about the poor namings)
>>>>
>>>> # qconf -sq test1
>>>> qname test1 #test2
>>>> hostlist @testSub1 #@testSub2
>>>>
>>>
>>> hostlist @testSub1 @testSub2
>>>
>>>
>>>
>>>> seq_no 0
>>>> load_thresholds np_load_avg=1.50
>>>> suspend_thresholds NONE
>>>> nsuspend 1
>>>> suspend_interval 00:05:00
>>>> priority 0
>>>> min_cpu_interval 00:05:00
>>>> processors UNDEFINED
>>>> qtype BATCH INTERATIVE
>>>> ckpt_list NONE
>>>> pe_list test_1 #test_2
>>>>
>>>
>>> pe_list NONE,[@testSub1=test_1],[@testSub2=test_2]
>>>
>>>
>>>
>>>> rerun FALSE
>>>> slots 4
>>>> tmpdir /tmp
>>>> shell /bin/bash
>>>> prolog NONE
>>>> epilog NONE
>>>> shell_start_mode unix_behavior
>>>> starter_method.....terminate_metod=NONE
>>>> notify 00:00:60
>>>> owner_list....calendar=NONE
>>>> initial_state default
>>>> s_rt....h_vmem=INFINITY
>>>>
>>>> The only thing that differs in the queue test2 is qname, hostlist,
>>>> and pe_list which are set to test2, @testSub2, and test_2
>>>> respectively. There are 4 cores per machine and a total of 8
>>>> machines in each hostgroup.
>>>>
>>>> # qconf -se test_1
>>>> pe_name test_1
>>>> slots 24
>>>>
>>>
>>> Is this a type or by intention to limit the usage, you stated 4x8  
>>> above:
>>>
>>> slots 32
>>>
>>> -- Reuti
>>>
>>>
>>>
>>>> user_lists NONE
>>>> xuser_lists NONE
>>>> start_proc_args /bin/true
>>>> stop_proc_args /bin/true
>>>> allocation_rule $round_robin
>>>> control_slaves FALSE
>>>> job_is_first_rank FALSE
>>>> urgency_slots min
>>>>
>>>> The only thing that differs in the PE test_2 is pe_name which is
>>>> set to test_2.
>>>>
>>>> # qconf -shgrp @testSub1
>>>> group_name @testSub1
>>>> hostlist host1.my.domain host2.my.domain host3.my.domain
>>>> host4.my.domain host5.my.domain \
>>>>            host6.my.domain host7.my.domain host8.my.domain
>>>>
>>>> In testSub2 the group_name and the hostnames differ.
>>>>
>>>> When I run a job like this: $ qsub -pe "test*" 1 ./testSub.sh I get
>>>> the error: cannot run in PE "test_1" because it only offers 0 slots
>>>> cannot run in PE "test_2" because it only offers 0 slots $ qstat -j
>>>> <jobid>|grep -v dropped
>>>> ...
>>>> sge_o_shell: /bin/bash
>>>> sge_o_workdir: /home/me
>>>> sge_o_host: myhost
>>>> account: sge
>>>> mail_list: me at myhost.my.domain
>>>> notify: FALSE
>>>> job_name: testSub.sh
>>>> jobshare: 0
>>>> env_list:
>>>> script_file: ./testSub.sh
>>>> parallel environment: test* range: 1
>>>>                                         cannot run in queue
>>>> "anotherq" because PE "test_1" is not in pe list
>>>>                                         cannot run in PE "test_1"
>>>> because it only offers 0 slots
>>>>                                         cannot run in queue
>>>> "anotherq" because PE "test_2" is not in pe list
>>>>                                         cannot run in PE "test_2"
>>>> because it only offers 0 slots
>>>>
>>>> $ cat testSub.sh
>>>> echo hostname=`hostname`
>>>> sleep 30
>>>>
>>>> //Sofia
>>>>
>>>>
>>>> Gerald Ragghianti skrev:
>>>>
>>>>> Hi Sofia, Yes, I think that the solution described at http://
>>>>> gridengine.sunsource.net/ds/viewMessage.do?
>>>>> dsForumId=38&dsMessageId=21159 is correct for your case. We have a
>>>>> similar situation where we need to group nodes according to which
>>>>> backend interconnect they use. This solution does work, but you
>>>>> will need to make sure that the queu instances for each node are
>>>>> included in the two cluster queues that you created (one queue for
>>>>> each PE that you have). - Gerald Sofia Bassil wrote:
>>>>>
>>>>>> Hi Chris and thanks for the reply. I will gladly supply
>>>>>> configuration information, but my first question is if this
>>>>>> solution is even relevant for my problem? //Sofia craffi skrev:
>>>>>>
>>>>>>> 2 questions -- - What is the output of "qstat -f" ? - Have you
>>>>>>> attached the PE's to any queues? -Chris On Dec 15, 2008, at
>>>>>>> 10:53 AM, Sofia Bassil wrote:
>>>>>>>
>>>>>>>> Hello, I am trying to set up a node allocation scheme depending
>>>>>>>> on network layout, following this thread: http://
>>>>>>>> gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>> dsForumId=38&dsMessageId=21153 I can't get it to work even
>>>>>>>> though I have the same configuration as far as I can see. I am
>>>>>>>> not using the same version of Grid Engine though, I am using GE
>>>>>>>> 6.1u4. My cluster basically consists of a few blade enclosures
>>>>>>>> and I want it to be possible to ask for cores in one enclosure,
>>>>>>>> so that you can utilize the better banwidth within the
>>>>>>>> enclosure. My plan is to set up all enclosures with one
>>>>>>>> hostgroup, one queue, and one pe each, like in the example.
>>>>>>>> When I run a job like this: $ qsub -pe "test*" 1 ./testSub.sh I
>>>>>>>> get the error: cannot run in PE "test_1" because it only offers
>>>>>>>> 0 slots cannot run in PE "test_2" because it only offers 0
>>>>>>>> slots My first question is, is it still possible to configure
>>>>>>>> this (in my version and in 6.2)? My second question is, for my
>>>>>>>> problem, is this the right solution? Sincerely, Sofia Bassil
>>>>>>>>
>>>>>> ------------------------------------------------------ http://
>>>>>> gridengine.sunsource.net/ds/viewMessage.do?
>>>>>> dsForumId=38&dsMessageId=92775 To unsubscribe from this
>>>>>> discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>>>
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>> dsForumId=38&dsMessageId=92980
>>>
>>> To unsubscribe from this discussion, e-mail: [users- 
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>>
>
>
> -- 
> Gerald Ragghianti
> IT Administrator - High Performance Computing
> http://hpc.usg.utk.edu/
> Office of Information Technology
> University of Tennessee
> Phone: 865-974-2448
> E-mail: geri at utk.edu
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=93052
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=93056

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list