[GE users] Allocate cores in the same enclosure

reuti reuti at staff.uni-marburg.de
Wed Dec 17 23:28:01 GMT 2008


Am 18.12.2008 um 00:21 schrieb Gerald Ragghianti:

> The following looks like an actual feature that would allow me to  
> associate two host groups with their respective PE's without making  
> two cluster-queues.  Is this not the case?  I've not seen this  
> documented anywhere.
>
> pe_list NONE,[@testSub1=test_1],[@testSub2=test_2]

Yep, this is available. But not the #qname feature or so, which I  
thought you are referring to.

It's at the beginning of `man queue_conf`in section "FORMAT - hostlist".

-- Reuti


>
> - Gerald
>
>
> reuti wrote:
>> Am 17.12.2008 um 23:50 schrieb Gerald Ragghianti:
>>
>>
>>> Where can I find documentation on this abbreviated queue definition
>>> format?  Is it supported in 6.1u5? Can it work with more than two
>>> hostgroups?
>>>
>>
>> Nowwhere, it's a comment from the author of the post to shorten it.
>>
>> -- Reuti
>>
>>
>>
>>> qname test1 #test2
>>> hostlist @testSub1 #@testSub2
>>>
>>>
>>> - Gerald
>>>
>>> Sofia Bassil wrote:
>>>
>>>> Hi Reuti,
>>>>
>>>> Using one queue the way you specified makes my job run, and it
>>>> allocated cores correctly. Thank you! Plus its a slimmer solution
>>>> whose configuration is easier to read. Very nice!
>>>> The fact that it works with one queue but not two, is that a
>>>> difference in the versions of Grid Engine? Is it supposed to work
>>>> with
>>>> 2 queues as well in 6.1?
>>>> The 24 was a result of sloppy copying.
>>>>
>>>> A follow up question:
>>>> Is there a way to allocate jobs first in one hostgroup, and then in
>>>> another. Say I have 3 blade enlosures with 8 machines with 4  
>>>> CPU:s in
>>>> each. Enclosure 1 and 2 are right next to each other while
>>>> enclosure 3
>>>> is in another location. I want my job to run on 40 cores with as  
>>>> high
>>>> bandwidth as possible. Can I somehow, using this same solution you
>>>> have helped me with, get the scheduler to allocate all the 32
>>>> cores in
>>>> enclosure 1 and then allocate the remaining 8 requested cores in
>>>> enclosure 2? Or is there some other way of doing this?
>>>>
>>>> //Sofia
>>>>
>>>>
>>>> reuti skrev:
>>>>
>>>>> Hi,
>>>>>
>>>>> Am 17.12.2008 um 14:48 schrieb Sofia Bassil:
>>>>>
>>>>>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> OK. I think I have followed the instructions exactly, but I  
>>>>>> get the
>>>>>> error so something must be missing. Here is what I set up:
>>>>>>
>>>>>> 2 queues, test1 and test2
>>>>>>
>>>>>>
>>>>> small side note. If you prefer, you could also stay with one queue
>>>>> nowadays:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> 2 PE:s, test_1 and test_2
>>>>>> 2 hostgroups, @testSub1 and @testSub2
>>>>>> (sorry about the poor namings)
>>>>>>
>>>>>> # qconf -sq test1
>>>>>> qname test1 #test2
>>>>>> hostlist @testSub1 #@testSub2
>>>>>>
>>>>>>
>>>>> hostlist @testSub1 @testSub2
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> seq_no 0
>>>>>> load_thresholds np_load_avg=1.50
>>>>>> suspend_thresholds NONE
>>>>>> nsuspend 1
>>>>>> suspend_interval 00:05:00
>>>>>> priority 0
>>>>>> min_cpu_interval 00:05:00
>>>>>> processors UNDEFINED
>>>>>> qtype BATCH INTERATIVE
>>>>>> ckpt_list NONE
>>>>>> pe_list test_1 #test_2
>>>>>>
>>>>>>
>>>>> pe_list NONE,[@testSub1=test_1],[@testSub2=test_2]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> rerun FALSE
>>>>>> slots 4
>>>>>> tmpdir /tmp
>>>>>> shell /bin/bash
>>>>>> prolog NONE
>>>>>> epilog NONE
>>>>>> shell_start_mode unix_behavior
>>>>>> starter_method.....terminate_metod=NONE
>>>>>> notify 00:00:60
>>>>>> owner_list....calendar=NONE
>>>>>> initial_state default
>>>>>> s_rt....h_vmem=INFINITY
>>>>>>
>>>>>> The only thing that differs in the queue test2 is qname,  
>>>>>> hostlist,
>>>>>> and pe_list which are set to test2, @testSub2, and test_2
>>>>>> respectively. There are 4 cores per machine and a total of 8
>>>>>> machines in each hostgroup.
>>>>>>
>>>>>> # qconf -se test_1
>>>>>> pe_name test_1
>>>>>> slots 24
>>>>>>
>>>>>>
>>>>> Is this a type or by intention to limit the usage, you stated 4x8
>>>>> above:
>>>>>
>>>>> slots 32
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> user_lists NONE
>>>>>> xuser_lists NONE
>>>>>> start_proc_args /bin/true
>>>>>> stop_proc_args /bin/true
>>>>>> allocation_rule $round_robin
>>>>>> control_slaves FALSE
>>>>>> job_is_first_rank FALSE
>>>>>> urgency_slots min
>>>>>>
>>>>>> The only thing that differs in the PE test_2 is pe_name which is
>>>>>> set to test_2.
>>>>>>
>>>>>> # qconf -shgrp @testSub1
>>>>>> group_name @testSub1
>>>>>> hostlist host1.my.domain host2.my.domain host3.my.domain
>>>>>> host4.my.domain host5.my.domain \
>>>>>>            host6.my.domain host7.my.domain host8.my.domain
>>>>>>
>>>>>> In testSub2 the group_name and the hostnames differ.
>>>>>>
>>>>>> When I run a job like this: $ qsub -pe "test*" 1 ./testSub.sh  
>>>>>> I get
>>>>>> the error: cannot run in PE "test_1" because it only offers 0  
>>>>>> slots
>>>>>> cannot run in PE "test_2" because it only offers 0 slots $  
>>>>>> qstat -j
>>>>>> <jobid>|grep -v dropped
>>>>>> ...
>>>>>> sge_o_shell: /bin/bash
>>>>>> sge_o_workdir: /home/me
>>>>>> sge_o_host: myhost
>>>>>> account: sge
>>>>>> mail_list: me at myhost.my.domain
>>>>>> notify: FALSE
>>>>>> job_name: testSub.sh
>>>>>> jobshare: 0
>>>>>> env_list:
>>>>>> script_file: ./testSub.sh
>>>>>> parallel environment: test* range: 1
>>>>>>                                         cannot run in queue
>>>>>> "anotherq" because PE "test_1" is not in pe list
>>>>>>                                         cannot run in PE "test_1"
>>>>>> because it only offers 0 slots
>>>>>>                                         cannot run in queue
>>>>>> "anotherq" because PE "test_2" is not in pe list
>>>>>>                                         cannot run in PE "test_2"
>>>>>> because it only offers 0 slots
>>>>>>
>>>>>> $ cat testSub.sh
>>>>>> echo hostname=`hostname`
>>>>>> sleep 30
>>>>>>
>>>>>> //Sofia
>>>>>>
>>>>>>
>>>>>> Gerald Ragghianti skrev:
>>>>>>
>>>>>>
>>>>>>> Hi Sofia, Yes, I think that the solution described at http://
>>>>>>> gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>> dsForumId=38&dsMessageId=21159 is correct for your case. We  
>>>>>>> have a
>>>>>>> similar situation where we need to group nodes according to  
>>>>>>> which
>>>>>>> backend interconnect they use. This solution does work, but you
>>>>>>> will need to make sure that the queu instances for each node are
>>>>>>> included in the two cluster queues that you created (one  
>>>>>>> queue for
>>>>>>> each PE that you have). - Gerald Sofia Bassil wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Hi Chris and thanks for the reply. I will gladly supply
>>>>>>>> configuration information, but my first question is if this
>>>>>>>> solution is even relevant for my problem? //Sofia craffi skrev:
>>>>>>>>
>>>>>>>>
>>>>>>>>> 2 questions -- - What is the output of "qstat -f" ? - Have you
>>>>>>>>> attached the PE's to any queues? -Chris On Dec 15, 2008, at
>>>>>>>>> 10:53 AM, Sofia Bassil wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Hello, I am trying to set up a node allocation scheme  
>>>>>>>>>> depending
>>>>>>>>>> on network layout, following this thread: http://
>>>>>>>>>> gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>>>> dsForumId=38&dsMessageId=21153 I can't get it to work even
>>>>>>>>>> though I have the same configuration as far as I can see.  
>>>>>>>>>> I am
>>>>>>>>>> not using the same version of Grid Engine though, I am  
>>>>>>>>>> using GE
>>>>>>>>>> 6.1u4. My cluster basically consists of a few blade  
>>>>>>>>>> enclosures
>>>>>>>>>> and I want it to be possible to ask for cores in one  
>>>>>>>>>> enclosure,
>>>>>>>>>> so that you can utilize the better banwidth within the
>>>>>>>>>> enclosure. My plan is to set up all enclosures with one
>>>>>>>>>> hostgroup, one queue, and one pe each, like in the example.
>>>>>>>>>> When I run a job like this: $ qsub -pe "test*" 1 ./ 
>>>>>>>>>> testSub.sh I
>>>>>>>>>> get the error: cannot run in PE "test_1" because it only  
>>>>>>>>>> offers
>>>>>>>>>> 0 slots cannot run in PE "test_2" because it only offers 0
>>>>>>>>>> slots My first question is, is it still possible to configure
>>>>>>>>>> this (in my version and in 6.2)? My second question is,  
>>>>>>>>>> for my
>>>>>>>>>> problem, is this the right solution? Sincerely, Sofia Bassil
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> ------------------------------------------------------ http://
>>>>>>>> gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>> dsForumId=38&dsMessageId=92775 To unsubscribe from this
>>>>>>>> discussion, e-mail: [users- 
>>>>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>>>>
>>>>>>>>
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>> dsForumId=38&dsMessageId=92980
>>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>
>>>>>
>>>>>
>>> -- 
>>> Gerald Ragghianti
>>> IT Administrator - High Performance Computing
>>> http://hpc.usg.utk.edu/
>>> Office of Information Technology
>>> University of Tennessee
>>> Phone: 865-974-2448
>>> E-mail: geri at utk.edu
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=93052
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=93056
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>>
>
>
> -- 
> Gerald Ragghianti
> IT Administrator - High Performance Computing
> http://hpc.usg.utk.edu/
> Office of Information Technology
> University of Tennessee
> Phone: 865-974-2448
> E-mail: geri at utk.edu
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=93057
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=93059

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list