[GE users] I need help with GE configuration

Reuti reuti at staff.uni-marburg.de
Mon Sep 1 13:26:48 BST 2008


Hi,

Am 01.09.2008 um 14:15 schrieb Esteban Freire:

> Hi Reuti,
>
> First of all, thanks for answering me :)
>
> Reuti wrote:
>> Hi,
>>
>> Am 28.08.2008 um 12:52 schrieb Esteban Freire:
>>
>>> Hi all,
>>>
>>> I'm getting some problems with me GE configuration. I don't  
>>> understand what I'm missing, so I would appreciate your help. By  
>>> the way, I'm using GE 6.1u3.
>>>
>>> I send attach on this e-mail the configuration files which I  
>>> consider more important. The problem is the  next:
>>>
>>> The WN which I have configured with GE, have 8 processors, and I  
>>> have the almost all configured as *complex_values         
>>> num_proc=8,s_vmem=8G*,  but three of them are configured as  
>>> *complex_values        num_proc=9,s_vmem=9G*, in order only a  
>>> queue can see these extra processors.
>>>
>>> Then, I have configured the almost all queues as:
>>>
>>> [ .... ]
>>> slots                 8,[wn001.egee.cesga.es=8], 
>>> [wn002.egee.cesga.es=4], \
>>>                      [wn004.egee.cesga.es=8], 
>>> [wn005.egee.cesga.es=8], \
>>>                      [wn006.egee.cesga.es=8], 
>>> [wn007.egee.cesga.es=8], \
>>>                      [wn008.egee.cesga.es=8], 
>>> [wn009.egee.cesga.es=8], \
>>>                      [wn010.egee.cesga.es=8], 
>>> [wn011.egee.cesga.es=8], \
>>>                      [wn012.egee.cesga.es=8], 
>>> [wn013.egee.cesga.es=8], \
>>>                      [wn014.egee.cesga.es=8]
>>>
>>> [ .... ]
>>> complex_values        num_proc=8,s_vmem=8
>> I would suggest never to touch num_proc anywhere. It's a feature  
>> of a machine which is fixed. Independent from the number of real  
>> seen cores in a machine, you can define as many slots as you like  
>> and oversubscribe the node this way.
>>
>> Also your RQS per user can be implemented by using slots instead  
>> of num_proc.
>>
>> When slots is only 8 in the above queue specification, there can't  
>> run 9 jobs in it (you should never see a outpt 9/8 for used 9 out  
>> of 8 in qstat). What you might notice is an oversubscription of a  
>> node due to combined usage from all queues? For this limit you  
>> would need an additonal RQS limiting the number of slots per  
>> machine to 8 (only the normal queues), and 9 (all queues including  
>> the special queue)
>>
>> limit name normal queues alice, atlas, biomed, cesga, (all  
>> additonal queues here) hosts {*} to slots=8
>>
>>
>> and a second RQS:
>>
>> limit name all queues alice, atlas, biomed, cesga, (all additonal  
>> queues here), ops hosts {*} to slots=9
>>
>>
>> -- Reuti
>
> Ok.  I have put num_proc, because we have configured num_proc as  
> consumable (<=    FORCED      YES        0        0) and we have  
> put that the users ask num_proc and s_vmem in the qsub. The problem  
> is that if I don't configure num_proc in the WN and I don't send  
> jobs asking num_proc, then I cannot see with a qhost how many CPUS  
> are being used in *NCPU*  variable, I can see the load but it don't  
> count the CPUS busy.

so you defined it just for a convenient output? For slots you could  
just use:

qhost -F slots

but when you use RQS, both options won't work any longer anyway.

> On the other hand, I have tested the  RQS rules additional that you  
> commented me, the problem is that, this is sequential, I mean,  it  
> starts  looking  what nodes have num_proc free, but if the first  
> node to check in the list is busy, it doesn't keep looking, and  
> therefore, this is not useful for me because at the end I have free  
> CPUS which I cannot used because it doesn't look these nodes.

No, this shouldn't be. Are you using "queue_sort_method    seqno"  
while observing this - can you set it to "np_load_avg"?

--- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list