[GE users] subordination and consumables

Reuti reuti at staff.uni-marburg.de
Thu Nov 22 14:59:00 GMT 2007


Am 22.11.2007 um 15:32 schrieb Ross Dickson:

> Okay, thanks.  I take h_vmem out of the "complex" line in "qconf - 
> me <host>" and I put it in the complex line in "qconf -mq  
> <queue>".   That seems to work.
>
> I've got different amounts of memory in different hosts, though.   
> Is there a way to handle that without creating separate cluster  
> queues for them?

yes, something like:

h_vmem                4G,[@nodes_8G=8G],[@nodes_16G=16G]

-- Reuti


> - Ross
>
>
> Reuti wrote:
>> Hi,
>>
>> Am 21.11.2007 um 20:20 schrieb Ross Dickson:
>>
>>> We have cluster queues configured here to prefer parallel jobs  
>>> over serial, and some configured to prefer jobs of the node  
>>> "owner" over jobs of other users.
>>>
>>> > qconf -sq all.q | grep subordinate
>>> subordinate_list      serial.q=1
>>>
>>> We have h_vmem configured as a consumable:
>>>
>>> > qconf -sc | grep h_vmem
>>> h_vmem              h_vmem     MEMORY      <=    YES          
>>> YES        2G       0
>>> > qconf -se cl005 | grep h_vmem
>>> complex_values        h_vmem=15G
>>
>> what is its setting on a queue level for h_vmem? If you intend to  
>> oversubscribe the memory, you can leave it out of the exechost  
>> definition, and the queue level gives the absolute limit. This  
>> should be possible in your setup, as the queues run exclusively.
>>
>> -- Reuti
>>
>>
>>> ...and this host, for example, has 4 slots in both queues.
>>>
>>> The problem is this:  If there are serial (or non-owner) jobs  
>>> running and a parallel (or owner) job is submitted which could  
>>> use the slots, Grid Engine will only schedule it if there is  
>>> sufficient h_vmem *already* free to run the superordinate job.
>>> It would make so much sense if Grid Engine would look at what  
>>> h_vmem the running jobs have consumed, and reason that since they  
>>> are in a subordinate queue they will be suspended and their  
>>> memory (temporarily) released (or swapped out).  It could  
>>> therefore use the h_vmem reserved for jobs in the subordinate  
>>> queue to calculate whether the parallel job can run.
>>>
>>> But it doesn't.
>>>
>>> Is there some way we can get subordination to work as desired  
>>> without throwing away the protection afforded by h_vmem consumable?
>
> -- 
> Ross Dickson         HPC Consultant
> ACEnet               http://www.ace-net.ca
> +1 902 494 6710      Skype: ross.m.dickson
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list