[GE users] Concurrent queue

Daniel Templeton Dan.Templeton at Sun.COM
Tue Jul 8 22:32:47 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

The answer is to create an INT resource for each queue pair, that is 
consumable but not requestable, with a default of 1.  Assign 6 of that 
resource to the host and 4 to each queue.  Any job that runs in either 
queue will be assigned the default resource usage of 1.  When there are 
four jobs running across both queues, the host's resource count will be 
exhausted, and no more jobs will be allowed to run in either queue.  
Because the resource is unique to each queue pair, queues will not 
interfere with each other.

Daniel

Chaffard Remi wrote:
> I'm going back to the original question to give more precisions.
>
> You told me to put slots=8 on each exec host and it works fine, but now
> we want to make it more complicated. We want to add some queues with
> high priority. Each job on these queues could be run even if the host is
> overloaded (we have a queue called smalljob in with we run job that does
> not need a lot of resources).
>
> If i understand well, we can't keep slots=8 on each node because if the
> node is full, the job in smalljob will not run.
>
> So the question is still the same, could we define a test queue which
> does not impact the production queues defined on the host ? Or how to
> make jobs in smalljob running even if the slots is defined to 8 on each
> nodes.
> We are running SGE 6u4.
>
> Thanks for help
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de] 
> Sent: mardi 2 octobre 2007 13:15
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Concurrent queue
>
> Am 01.10.2007 um 23:21 schrieb Heywood, Todd:
>
>   
>>>> SGE has a lot of flexibility for dealing with priorities of pending
>>>> jobs,
>>>> and scheduling. The resource quotas are nice for preventing users  
>>>> from
>>>> starting (running) jobs which take up all cluster resources. But we
>>>> have to
>>>> probably classic problem of wanting to allow users to use as many
>>>> slots as
>>>> are available when they submit jobs, as long as their *running*
>>>> jobs don't
>>>> prevent other users that come in behind them from having their jobs
>>>> run.
>>>>         
>>> Are these long running jobs? Often the fair-share-scheduling is fine
>>> to give all users the same number of jobs in the cluster over time.
>>>       
>> Yes, we get the occasional large set of long running jobs which can  
>> eat up
>> the majority of slots in the cluster. At the same time we don't  
>> want to
>> restrict them to less than the available number of slots, since our  
>> usage is
>> spikey/volatile, and getting them finished ASAP is desirable.
>>
>>     
>>>> I know there is no magic bullet. But does anyone have any specific
>>>> suggestions besides a a general priority queue set-up with  
>>>> subordinate
>>>> queues having every job suspended?
>>>>         
>>> What about setting a suspend_threshold for public.q? When the now.q
>>> generates load which will exceed this limit (nowadays with
>>> cores=slots it should be fine to setup np_load_avg=1), some of the
>>> jobs in public.q will be suspended (adjustable in this case to 1)
>>> during each time interval (man queue_conf).
>>>
>>> The resources (i.e. memory) will of course still be donated to
>>> public.q, unless you define a checkpointing environment in addition
>>> where a suspend might trigger a migration of the job. SGE will
>>> support checkpointing if it's in the application/OS already,
>>> otherwise the job will restart from the beginnning.
>>>
>>>       
>> That's not a bad idea. Also, we have a virtual_free consumable  
>> defined, so
>> presumably now.q jobs won't be scheduled on a node unless there is  
>> enough
>> memory left that the public.q jobs don't need. I don't suppose it is
>> possible with a suspend_threshold to tell SGE to pick jobs to  
>> suspend which
>> are using the least memory? :-)
>>     
>
> I don't know the internal algorithm either, which of the jobs will be  
> suspended.
>
> But you have to give a little bit more virtual_free in this case, as  
> both queues consume it.
>
> -- Reuti
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list