[GE users] Allocate cores in the same enclosure

Gerald Ragghianti geri at utk.edu
Wed Dec 17 23:21:02 GMT 2008


The following looks like an actual feature that would allow me to associate two host groups with their respective PE's without making two cluster-queues.  Is this not the case?  I've not seen this documented anywhere.

pe_list NONE,[@testSub1=test_1],[@testSub2=test_2]


- Gerald


reuti wrote:
> Am 17.12.2008 um 23:50 schrieb Gerald Ragghianti:
>
>   
>> Where can I find documentation on this abbreviated queue definition
>> format?  Is it supported in 6.1u5? Can it work with more than two
>> hostgroups?
>>     
>
> Nowwhere, it's a comment from the author of the post to shorten it.
>
> -- Reuti
>
>
>   
>> qname test1 #test2
>> hostlist @testSub1 #@testSub2
>>
>>
>> - Gerald
>>
>> Sofia Bassil wrote:
>>     
>>> Hi Reuti,
>>>
>>> Using one queue the way you specified makes my job run, and it
>>> allocated cores correctly. Thank you! Plus its a slimmer solution
>>> whose configuration is easier to read. Very nice!
>>> The fact that it works with one queue but not two, is that a
>>> difference in the versions of Grid Engine? Is it supposed to work  
>>> with
>>> 2 queues as well in 6.1?
>>> The 24 was a result of sloppy copying.
>>>
>>> A follow up question:
>>> Is there a way to allocate jobs first in one hostgroup, and then in
>>> another. Say I have 3 blade enlosures with 8 machines with 4 CPU:s in
>>> each. Enclosure 1 and 2 are right next to each other while  
>>> enclosure 3
>>> is in another location. I want my job to run on 40 cores with as high
>>> bandwidth as possible. Can I somehow, using this same solution you
>>> have helped me with, get the scheduler to allocate all the 32  
>>> cores in
>>> enclosure 1 and then allocate the remaining 8 requested cores in
>>> enclosure 2? Or is there some other way of doing this?
>>>
>>> //Sofia
>>>
>>>
>>> reuti skrev:
>>>       
>>>> Hi,
>>>>
>>>> Am 17.12.2008 um 14:48 schrieb Sofia Bassil:
>>>>
>>>>
>>>>         
>>>>> Hi,
>>>>>
>>>>> OK. I think I have followed the instructions exactly, but I get the
>>>>> error so something must be missing. Here is what I set up:
>>>>>
>>>>> 2 queues, test1 and test2
>>>>>
>>>>>           
>>>> small side note. If you prefer, you could also stay with one queue
>>>> nowadays:
>>>>
>>>>
>>>>
>>>>         
>>>>> 2 PE:s, test_1 and test_2
>>>>> 2 hostgroups, @testSub1 and @testSub2
>>>>> (sorry about the poor namings)
>>>>>
>>>>> # qconf -sq test1
>>>>> qname test1 #test2
>>>>> hostlist @testSub1 #@testSub2
>>>>>
>>>>>           
>>>> hostlist @testSub1 @testSub2
>>>>
>>>>
>>>>
>>>>         
>>>>> seq_no 0
>>>>> load_thresholds np_load_avg=1.50
>>>>> suspend_thresholds NONE
>>>>> nsuspend 1
>>>>> suspend_interval 00:05:00
>>>>> priority 0
>>>>> min_cpu_interval 00:05:00
>>>>> processors UNDEFINED
>>>>> qtype BATCH INTERATIVE
>>>>> ckpt_list NONE
>>>>> pe_list test_1 #test_2
>>>>>
>>>>>           
>>>> pe_list NONE,[@testSub1=test_1],[@testSub2=test_2]
>>>>
>>>>
>>>>
>>>>         
>>>>> rerun FALSE
>>>>> slots 4
>>>>> tmpdir /tmp
>>>>> shell /bin/bash
>>>>> prolog NONE
>>>>> epilog NONE
>>>>> shell_start_mode unix_behavior
>>>>> starter_method.....terminate_metod=NONE
>>>>> notify 00:00:60
>>>>> owner_list....calendar=NONE
>>>>> initial_state default
>>>>> s_rt....h_vmem=INFINITY
>>>>>
>>>>> The only thing that differs in the queue test2 is qname, hostlist,
>>>>> and pe_list which are set to test2, @testSub2, and test_2
>>>>> respectively. There are 4 cores per machine and a total of 8
>>>>> machines in each hostgroup.
>>>>>
>>>>> # qconf -se test_1
>>>>> pe_name test_1
>>>>> slots 24
>>>>>
>>>>>           
>>>> Is this a type or by intention to limit the usage, you stated 4x8  
>>>> above:
>>>>
>>>> slots 32
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>
>>>>         
>>>>> user_lists NONE
>>>>> xuser_lists NONE
>>>>> start_proc_args /bin/true
>>>>> stop_proc_args /bin/true
>>>>> allocation_rule $round_robin
>>>>> control_slaves FALSE
>>>>> job_is_first_rank FALSE
>>>>> urgency_slots min
>>>>>
>>>>> The only thing that differs in the PE test_2 is pe_name which is
>>>>> set to test_2.
>>>>>
>>>>> # qconf -shgrp @testSub1
>>>>> group_name @testSub1
>>>>> hostlist host1.my.domain host2.my.domain host3.my.domain
>>>>> host4.my.domain host5.my.domain \
>>>>>            host6.my.domain host7.my.domain host8.my.domain
>>>>>
>>>>> In testSub2 the group_name and the hostnames differ.
>>>>>
>>>>> When I run a job like this: $ qsub -pe "test*" 1 ./testSub.sh I get
>>>>> the error: cannot run in PE "test_1" because it only offers 0 slots
>>>>> cannot run in PE "test_2" because it only offers 0 slots $ qstat -j
>>>>> <jobid>|grep -v dropped
>>>>> ...
>>>>> sge_o_shell: /bin/bash
>>>>> sge_o_workdir: /home/me
>>>>> sge_o_host: myhost
>>>>> account: sge
>>>>> mail_list: me at myhost.my.domain
>>>>> notify: FALSE
>>>>> job_name: testSub.sh
>>>>> jobshare: 0
>>>>> env_list:
>>>>> script_file: ./testSub.sh
>>>>> parallel environment: test* range: 1
>>>>>                                         cannot run in queue
>>>>> "anotherq" because PE "test_1" is not in pe list
>>>>>                                         cannot run in PE "test_1"
>>>>> because it only offers 0 slots
>>>>>                                         cannot run in queue
>>>>> "anotherq" because PE "test_2" is not in pe list
>>>>>                                         cannot run in PE "test_2"
>>>>> because it only offers 0 slots
>>>>>
>>>>> $ cat testSub.sh
>>>>> echo hostname=`hostname`
>>>>> sleep 30
>>>>>
>>>>> //Sofia
>>>>>
>>>>>
>>>>> Gerald Ragghianti skrev:
>>>>>
>>>>>           
>>>>>> Hi Sofia, Yes, I think that the solution described at http://
>>>>>> gridengine.sunsource.net/ds/viewMessage.do?
>>>>>> dsForumId=38&dsMessageId=21159 is correct for your case. We have a
>>>>>> similar situation where we need to group nodes according to which
>>>>>> backend interconnect they use. This solution does work, but you
>>>>>> will need to make sure that the queu instances for each node are
>>>>>> included in the two cluster queues that you created (one queue for
>>>>>> each PE that you have). - Gerald Sofia Bassil wrote:
>>>>>>
>>>>>>             
>>>>>>> Hi Chris and thanks for the reply. I will gladly supply
>>>>>>> configuration information, but my first question is if this
>>>>>>> solution is even relevant for my problem? //Sofia craffi skrev:
>>>>>>>
>>>>>>>               
>>>>>>>> 2 questions -- - What is the output of "qstat -f" ? - Have you
>>>>>>>> attached the PE's to any queues? -Chris On Dec 15, 2008, at
>>>>>>>> 10:53 AM, Sofia Bassil wrote:
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Hello, I am trying to set up a node allocation scheme depending
>>>>>>>>> on network layout, following this thread: http://
>>>>>>>>> gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>>> dsForumId=38&dsMessageId=21153 I can't get it to work even
>>>>>>>>> though I have the same configuration as far as I can see. I am
>>>>>>>>> not using the same version of Grid Engine though, I am using GE
>>>>>>>>> 6.1u4. My cluster basically consists of a few blade enclosures
>>>>>>>>> and I want it to be possible to ask for cores in one enclosure,
>>>>>>>>> so that you can utilize the better banwidth within the
>>>>>>>>> enclosure. My plan is to set up all enclosures with one
>>>>>>>>> hostgroup, one queue, and one pe each, like in the example.
>>>>>>>>> When I run a job like this: $ qsub -pe "test*" 1 ./testSub.sh I
>>>>>>>>> get the error: cannot run in PE "test_1" because it only offers
>>>>>>>>> 0 slots cannot run in PE "test_2" because it only offers 0
>>>>>>>>> slots My first question is, is it still possible to configure
>>>>>>>>> this (in my version and in 6.2)? My second question is, for my
>>>>>>>>> problem, is this the right solution? Sincerely, Sofia Bassil
>>>>>>>>>
>>>>>>>>>                   
>>>>>>> ------------------------------------------------------ http://
>>>>>>> gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>> dsForumId=38&dsMessageId=92775 To unsubscribe from this
>>>>>>> discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>>>>
>>>>>>>               
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>>> dsForumId=38&dsMessageId=92980
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users- 
>>>> unsubscribe at gridengine.sunsource.net].
>>>>
>>>>
>>>>         
>> -- 
>> Gerald Ragghianti
>> IT Administrator - High Performance Computing
>> http://hpc.usg.utk.edu/
>> Office of Information Technology
>> University of Tennessee
>> Phone: 865-974-2448
>> E-mail: geri at utk.edu
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=93052
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>>     
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=93056
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>   


-- 
Gerald Ragghianti
IT Administrator - High Performance Computing
http://hpc.usg.utk.edu/
Office of Information Technology
University of Tennessee
Phone: 865-974-2448
E-mail: geri at utk.edu

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=93057

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

    [ Part 2, "S/MIME Cryptographic Signature" ]
    [ Application/X-PKCS7-SIGNATURE (Name: "smime.p7s") 3.3 KB. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list