[GE users] Can I stop backfilling?

Kevin Doman kdoman07 at gmail.com
Tue May 20 20:24:00 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

For this particular cluster, I have 64 dual core nodes, 256 cores total.


On Tue, May 20, 2008 at 2:13 PM, Daniel Templeton <Dan.Templeton at sun.com> wrote:
> Depends on the size of your cluster and what your priorities are.
>
> Daniel
>
> Kevin Doman wrote:
>>
>> max_reservation = 30; Is this ....sensible?
>>
>>
>> On Tue, May 20, 2008 at 10:25 AM, Reuti <reuti at staff.uni-marburg.de>
>> wrote:
>>
>>>
>>> Am 20.05.2008 um 17:15 schrieb Daniel Templeton:
>>>
>>>
>>>>
>>>> If your jobs are starving, what you're seeing is not backfilling. :)
>>>>  What
>>>> version of SGE are you using?  There was (is?) a bug where the first RR
>>>> job
>>>> was ignored.  Submitting a second identical RR job, in that case, would
>>>> then
>>>> cause the scheduler to take notice and actually do the RR properly.
>>>>
>>>> By definition, backfilling cannot cause starvation, unless a backfilled
>>>> job runs forever.
>>>>
>>>
>>> Good point. What h_rt is requested by these short jobs? Otherwise the
>>> default_duration will be taken (but not enforced) and this might lead to
>>> a
>>> roll-over from one extending job (running longer than the estimated
>>> default
>>> 10 minutes) to the next one and so onI fear.
>>>
>>>
>>>>
>>>>  BTW, when you say your jobs run 15-20 minutes, are they setting sort or
>>>> hard run time limits?  If not, what is your default_duration?
>>>>
>>>> Daniel
>>>>
>>>> Kevin Doman wrote:
>>>>
>>>>>
>>>>> We have a very busy cluster that always have thousands of short jobs
>>>>> (15-20 minutes) in queue. Occasionally, a user come in and submit a 20
>>>>> processor parallel job with h_rt=100 hours. While reservation is
>>>>> enabled (-R y) and priority set to 1024, we continue to experience job
>>>>>
>>>
>>> max_reservation is also set up to a sensible value?
>>>
>>> -- Reuti
>>>
>>>
>>>
>>>>>
>>>>> backfills which resulted in the same 'parallel job starvation' issue.
>>>>>
>>>>> Is it possible for me to stop backfilling altogether and let the
>>>>> parallel jobs go first?
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list