[GE users] subordination and consumables

Ross Dickson Ross.Dickson at dal.ca
Thu Nov 22 14:32:53 GMT 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Okay, thanks.  I take h_vmem out of the "complex" line in "qconf -me 
<host>" and I put it in the complex line in "qconf -mq <queue>".   That 
seems to work.

I've got different amounts of memory in different hosts, though.  Is 
there a way to handle that without creating separate cluster queues for 
them?

- Ross


Reuti wrote:
> Hi,
>
> Am 21.11.2007 um 20:20 schrieb Ross Dickson:
>
>> We have cluster queues configured here to prefer parallel jobs over 
>> serial, and some configured to prefer jobs of the node "owner" over 
>> jobs of other users.
>>
>> > qconf -sq all.q | grep subordinate
>> subordinate_list      serial.q=1
>>
>> We have h_vmem configured as a consumable:
>>
>> > qconf -sc | grep h_vmem
>> h_vmem              h_vmem     MEMORY      <=    YES         
>> YES        2G       0
>> > qconf -se cl005 | grep h_vmem
>> complex_values        h_vmem=15G
>
> what is its setting on a queue level for h_vmem? If you intend to 
> oversubscribe the memory, you can leave it out of the exechost 
> definition, and the queue level gives the absolute limit. This should 
> be possible in your setup, as the queues run exclusively.
>
> -- Reuti
>
>
>> ...and this host, for example, has 4 slots in both queues.
>>
>> The problem is this:  If there are serial (or non-owner) jobs running 
>> and a parallel (or owner) job is submitted which could use the slots, 
>> Grid Engine will only schedule it if there is sufficient h_vmem 
>> *already* free to run the superordinate job.
>> It would make so much sense if Grid Engine would look at what h_vmem 
>> the running jobs have consumed, and reason that since they are in a 
>> subordinate queue they will be suspended and their memory 
>> (temporarily) released (or swapped out).  It could therefore use the 
>> h_vmem reserved for jobs in the subordinate queue to calculate 
>> whether the parallel job can run.
>>
>> But it doesn't.
>>
>> Is there some way we can get subordination to work as desired without 
>> throwing away the protection afforded by h_vmem consumable?

-- 
Ross Dickson         HPC Consultant
ACEnet               http://www.ace-net.ca
+1 902 494 6710      Skype: ross.m.dickson

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list