[GE users] Can I stop backfilling?

Daniel Templeton Dan.Templeton at Sun.COM
Tue May 20 21:16:28 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

30 might be a little high.  The idea of the reservation limit is 
twofold.  First, it lets you limit the amount of your system that is 
tied up with reservations.  In your case, 30 reservations could easily 
take up the entire system.  Whether that's OK or not is your call.  The 
second thing is that resource reservations consume scheduler time.  
Given the size of your cluster, that's probably not an issue, though.

Daniel

Kevin Doman wrote:
> For this particular cluster, I have 64 dual core nodes, 256 cores total.
>
>
> On Tue, May 20, 2008 at 2:13 PM, Daniel Templeton <Dan.Templeton at sun.com> wrote:
>   
>> Depends on the size of your cluster and what your priorities are.
>>
>> Daniel
>>
>> Kevin Doman wrote:
>>     
>>> max_reservation = 30; Is this ....sensible?
>>>
>>>
>>> On Tue, May 20, 2008 at 10:25 AM, Reuti <reuti at staff.uni-marburg.de>
>>> wrote:
>>>
>>>       
>>>> Am 20.05.2008 um 17:15 schrieb Daniel Templeton:
>>>>
>>>>
>>>>         
>>>>> If your jobs are starving, what you're seeing is not backfilling. :)
>>>>>  What
>>>>> version of SGE are you using?  There was (is?) a bug where the first RR
>>>>> job
>>>>> was ignored.  Submitting a second identical RR job, in that case, would
>>>>> then
>>>>> cause the scheduler to take notice and actually do the RR properly.
>>>>>
>>>>> By definition, backfilling cannot cause starvation, unless a backfilled
>>>>> job runs forever.
>>>>>
>>>>>           
>>>> Good point. What h_rt is requested by these short jobs? Otherwise the
>>>> default_duration will be taken (but not enforced) and this might lead to
>>>> a
>>>> roll-over from one extending job (running longer than the estimated
>>>> default
>>>> 10 minutes) to the next one and so onI fear.
>>>>
>>>>
>>>>         
>>>>>  BTW, when you say your jobs run 15-20 minutes, are they setting sort or
>>>>> hard run time limits?  If not, what is your default_duration?
>>>>>
>>>>> Daniel
>>>>>
>>>>> Kevin Doman wrote:
>>>>>
>>>>>           
>>>>>> We have a very busy cluster that always have thousands of short jobs
>>>>>> (15-20 minutes) in queue. Occasionally, a user come in and submit a 20
>>>>>> processor parallel job with h_rt=100 hours. While reservation is
>>>>>> enabled (-R y) and priority set to 1024, we continue to experience job
>>>>>>
>>>>>>             
>>>> max_reservation is also set up to a sensible value?
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>
>>>>         
>>>>>> backfills which resulted in the same 'parallel job starvation' issue.
>>>>>>
>>>>>> Is it possible for me to stop backfilling altogether and let the
>>>>>> parallel jobs go first?
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>           
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>
>>>>
>>>>         
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list