[GE users] Allocate cores in the same enclosure

Gerald Ragghianti geri at utk.edu
Wed Dec 17 22:50:12 GMT 2008


Where can I find documentation on this abbreviated queue definition 
format?  Is it supported in 6.1u5? Can it work with more than two 
hostgroups?

qname test1 #test2
hostlist @testSub1 #@testSub2


- Gerald

Sofia Bassil wrote:
> Hi Reuti,
>
> Using one queue the way you specified makes my job run, and it 
> allocated cores correctly. Thank you! Plus its a slimmer solution 
> whose configuration is easier to read. Very nice!
> The fact that it works with one queue but not two, is that a 
> difference in the versions of Grid Engine? Is it supposed to work with 
> 2 queues as well in 6.1?
> The 24 was a result of sloppy copying.
>
> A follow up question:
> Is there a way to allocate jobs first in one hostgroup, and then in 
> another. Say I have 3 blade enlosures with 8 machines with 4 CPU:s in 
> each. Enclosure 1 and 2 are right next to each other while enclosure 3 
> is in another location. I want my job to run on 40 cores with as high 
> bandwidth as possible. Can I somehow, using this same solution you 
> have helped me with, get the scheduler to allocate all the 32 cores in 
> enclosure 1 and then allocate the remaining 8 requested cores in 
> enclosure 2? Or is there some other way of doing this?
>
> //Sofia
>
>
> reuti skrev:
>> Hi,
>>
>> Am 17.12.2008 um 14:48 schrieb Sofia Bassil:
>>
>>   
>>> Hi,
>>>
>>> OK. I think I have followed the instructions exactly, but I get the  
>>> error so something must be missing. Here is what I set up:
>>>
>>> 2 queues, test1 and test2
>>>     
>>
>> small side note. If you prefer, you could also stay with one queue  
>> nowadays:
>>
>>
>>   
>>> 2 PE:s, test_1 and test_2
>>> 2 hostgroups, @testSub1 and @testSub2
>>> (sorry about the poor namings)
>>>
>>> # qconf -sq test1
>>> qname test1 #test2
>>> hostlist @testSub1 #@testSub2
>>>     
>>
>> hostlist @testSub1 @testSub2
>>
>>
>>   
>>> seq_no 0
>>> load_thresholds np_load_avg=1.50
>>> suspend_thresholds NONE
>>> nsuspend 1
>>> suspend_interval 00:05:00
>>> priority 0
>>> min_cpu_interval 00:05:00
>>> processors UNDEFINED
>>> qtype BATCH INTERATIVE
>>> ckpt_list NONE
>>> pe_list test_1 #test_2
>>>     
>>
>> pe_list NONE,[@testSub1=test_1],[@testSub2=test_2]
>>
>>
>>   
>>> rerun FALSE
>>> slots 4
>>> tmpdir /tmp
>>> shell /bin/bash
>>> prolog NONE
>>> epilog NONE
>>> shell_start_mode unix_behavior
>>> starter_method.....terminate_metod=NONE
>>> notify 00:00:60
>>> owner_list....calendar=NONE
>>> initial_state default
>>> s_rt....h_vmem=INFINITY
>>>
>>> The only thing that differs in the queue test2 is qname, hostlist,  
>>> and pe_list which are set to test2, @testSub2, and test_2  
>>> respectively. There are 4 cores per machine and a total of 8  
>>> machines in each hostgroup.
>>>
>>> # qconf -se test_1
>>> pe_name test_1
>>> slots 24
>>>     
>>
>> Is this a type or by intention to limit the usage, you stated 4x8 above:
>>
>> slots 32
>>
>> -- Reuti
>>
>>
>>   
>>> user_lists NONE
>>> xuser_lists NONE
>>> start_proc_args /bin/true
>>> stop_proc_args /bin/true
>>> allocation_rule $round_robin
>>> control_slaves FALSE
>>> job_is_first_rank FALSE
>>> urgency_slots min
>>>
>>> The only thing that differs in the PE test_2 is pe_name which is  
>>> set to test_2.
>>>
>>> # qconf -shgrp @testSub1
>>> group_name @testSub1
>>> hostlist host1.my.domain host2.my.domain host3.my.domain  
>>> host4.my.domain host5.my.domain \
>>>            host6.my.domain host7.my.domain host8.my.domain
>>>
>>> In testSub2 the group_name and the hostnames differ.
>>>
>>> When I run a job like this: $ qsub -pe "test*" 1 ./testSub.sh I get  
>>> the error: cannot run in PE "test_1" because it only offers 0 slots  
>>> cannot run in PE "test_2" because it only offers 0 slots $ qstat -j  
>>> <jobid>|grep -v dropped
>>> ...
>>> sge_o_shell: /bin/bash
>>> sge_o_workdir: /home/me
>>> sge_o_host: myhost
>>> account: sge
>>> mail_list: me at myhost.my.domain
>>> notify: FALSE
>>> job_name: testSub.sh
>>> jobshare: 0
>>> env_list:
>>> script_file: ./testSub.sh
>>> parallel environment: test* range: 1
>>>                                         cannot run in queue  
>>> "anotherq" because PE "test_1" is not in pe list
>>>                                         cannot run in PE "test_1"  
>>> because it only offers 0 slots
>>>                                         cannot run in queue  
>>> "anotherq" because PE "test_2" is not in pe list
>>>                                         cannot run in PE "test_2"  
>>> because it only offers 0 slots
>>>
>>> $ cat testSub.sh
>>> echo hostname=`hostname`
>>> sleep 30
>>>
>>> //Sofia
>>>
>>>
>>> Gerald Ragghianti skrev:
>>>     
>>>> Hi Sofia, Yes, I think that the solution described at http:// 
>>>> gridengine.sunsource.net/ds/viewMessage.do? 
>>>> dsForumId=38&dsMessageId=21159 is correct for your case. We have a  
>>>> similar situation where we need to group nodes according to which  
>>>> backend interconnect they use. This solution does work, but you  
>>>> will need to make sure that the queu instances for each node are  
>>>> included in the two cluster queues that you created (one queue for  
>>>> each PE that you have). - Gerald Sofia Bassil wrote:
>>>>       
>>>>> Hi Chris and thanks for the reply. I will gladly supply  
>>>>> configuration information, but my first question is if this  
>>>>> solution is even relevant for my problem? //Sofia craffi skrev:
>>>>>         
>>>>>> 2 questions -- - What is the output of "qstat -f" ? - Have you  
>>>>>> attached the PE's to any queues? -Chris On Dec 15, 2008, at  
>>>>>> 10:53 AM, Sofia Bassil wrote:
>>>>>>           
>>>>>>> Hello, I am trying to set up a node allocation scheme depending  
>>>>>>> on network layout, following this thread: http:// 
>>>>>>> gridengine.sunsource.net/ds/viewMessage.do? 
>>>>>>> dsForumId=38&dsMessageId=21153 I can't get it to work even  
>>>>>>> though I have the same configuration as far as I can see. I am  
>>>>>>> not using the same version of Grid Engine though, I am using GE  
>>>>>>> 6.1u4. My cluster basically consists of a few blade enclosures  
>>>>>>> and I want it to be possible to ask for cores in one enclosure,  
>>>>>>> so that you can utilize the better banwidth within the  
>>>>>>> enclosure. My plan is to set up all enclosures with one  
>>>>>>> hostgroup, one queue, and one pe each, like in the example.  
>>>>>>> When I run a job like this: $ qsub -pe "test*" 1 ./testSub.sh I  
>>>>>>> get the error: cannot run in PE "test_1" because it only offers  
>>>>>>> 0 slots cannot run in PE "test_2" because it only offers 0  
>>>>>>> slots My first question is, is it still possible to configure  
>>>>>>> this (in my version and in 6.2)? My second question is, for my  
>>>>>>> problem, is this the right solution? Sincerely, Sofia Bassil
>>>>>>>             
>>>>> ------------------------------------------------------ http:// 
>>>>> gridengine.sunsource.net/ds/viewMessage.do? 
>>>>> dsForumId=38&dsMessageId=92775 To unsubscribe from this  
>>>>> discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>>         
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=92980
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>
>>   


-- 
Gerald Ragghianti
IT Administrator - High Performance Computing
http://hpc.usg.utk.edu/
Office of Information Technology
University of Tennessee
Phone: 865-974-2448
E-mail: geri at utk.edu

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=93052

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

    [ Part 2, "S/MIME Cryptographic Signature" ]
    [ Application/X-PKCS7-SIGNATURE (Name: "smime.p7s") 3.3 KB. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list