[GE users] Concurrent queue

Daniel Templeton Dan.Templeton at Sun.COM
Tue Jul 8 22:35:44 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Um...  Just noticed that I very thoughtfully answered an email that was 
over 8 months old.  Oops.  I certainly hope that Chaffard wasn't still 
waiting for an answer. :)

Daniel

Daniel Templeton wrote:
> The answer is to create an INT resource for each queue pair, that is 
> consumable but not requestable, with a default of 1.  Assign 6 of that 
> resource to the host and 4 to each queue.  Any job that runs in either 
> queue will be assigned the default resource usage of 1.  When there 
> are four jobs running across both queues, the host's resource count 
> will be exhausted, and no more jobs will be allowed to run in either 
> queue.  Because the resource is unique to each queue pair, queues will 
> not interfere with each other.
>
> Daniel
>
> Chaffard Remi wrote:
>> I'm going back to the original question to give more precisions.
>>
>> You told me to put slots=8 on each exec host and it works fine, but now
>> we want to make it more complicated. We want to add some queues with
>> high priority. Each job on these queues could be run even if the host is
>> overloaded (we have a queue called smalljob in with we run job that does
>> not need a lot of resources).
>>
>> If i understand well, we can't keep slots=8 on each node because if the
>> node is full, the job in smalljob will not run.
>>
>> So the question is still the same, could we define a test queue which
>> does not impact the production queues defined on the host ? Or how to
>> make jobs in smalljob running even if the slots is defined to 8 on each
>> nodes.
>> We are running SGE 6u4.
>>
>> Thanks for help
>>
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de] Sent: mardi 2 octobre 
>> 2007 13:15
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Concurrent queue
>>
>> Am 01.10.2007 um 23:21 schrieb Heywood, Todd:
>>
>>  
>>>>> SGE has a lot of flexibility for dealing with priorities of pending
>>>>> jobs,
>>>>> and scheduling. The resource quotas are nice for preventing users  
>>>>> from
>>>>> starting (running) jobs which take up all cluster resources. But we
>>>>> have to
>>>>> probably classic problem of wanting to allow users to use as many
>>>>> slots as
>>>>> are available when they submit jobs, as long as their *running*
>>>>> jobs don't
>>>>> prevent other users that come in behind them from having their jobs
>>>>> run.
>>>>>         
>>>> Are these long running jobs? Often the fair-share-scheduling is fine
>>>> to give all users the same number of jobs in the cluster over time.
>>>>       
>>> Yes, we get the occasional large set of long running jobs which can  
>>> eat up
>>> the majority of slots in the cluster. At the same time we don't  
>>> want to
>>> restrict them to less than the available number of slots, since our  
>>> usage is
>>> spikey/volatile, and getting them finished ASAP is desirable.
>>>
>>>    
>>>>> I know there is no magic bullet. But does anyone have any specific
>>>>> suggestions besides a a general priority queue set-up with  
>>>>> subordinate
>>>>> queues having every job suspended?
>>>>>         
>>>> What about setting a suspend_threshold for public.q? When the now.q
>>>> generates load which will exceed this limit (nowadays with
>>>> cores=slots it should be fine to setup np_load_avg=1), some of the
>>>> jobs in public.q will be suspended (adjustable in this case to 1)
>>>> during each time interval (man queue_conf).
>>>>
>>>> The resources (i.e. memory) will of course still be donated to
>>>> public.q, unless you define a checkpointing environment in addition
>>>> where a suspend might trigger a migration of the job. SGE will
>>>> support checkpointing if it's in the application/OS already,
>>>> otherwise the job will restart from the beginnning.
>>>>
>>>>       
>>> That's not a bad idea. Also, we have a virtual_free consumable  
>>> defined, so
>>> presumably now.q jobs won't be scheduled on a node unless there is  
>>> enough
>>> memory left that the public.q jobs don't need. I don't suppose it is
>>> possible with a suspend_threshold to tell SGE to pick jobs to  
>>> suspend which
>>> are using the least memory? :-)
>>>     
>>
>> I don't know the internal algorithm either, which of the jobs will 
>> be  suspended.
>>
>> But you have to give a little bit more virtual_free in this case, as  
>> both queues consume it.
>>
>> -- Reuti
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>   
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list