[GE users] SGE-6.2u5: Slot reservation for different queues?

reuti reuti at staff.uni-marburg.de
Tue Jul 13 16:05:31 BST 2010


Am 13.07.2010 um 10:35 schrieb soyez:

> Good morning Reuti,
> 
> On Mon, 12 Jul 2010, reuti wrote:
>> 		:
>> 		:
>> For me it looks working, the serial job gets in 6.2u5:
>> 
>> ...cannot run at host...because it offers only hc:slots=0.000000 due
>> to a reservation
>> 
>> Do you use any RQS?
>> 		:
>> 		:
> 
> yes, we do use RQS, but these were far higher than the number of
> running jobs.  Is there any link between RQS and reservation?  It's

I thought of some side effect.


> good to know that it works for you, so it must be some kind of special 
> (mis)configuration at our site.  I guess we will have to set up another
> test cluster in order to reproduce the error in a more controlable
> environment.  But again, do you know of any possibility to switch off
> backfilling?

AFAIK this is a feature of the reservation. You have both or nothing. You could only supply a really high runtime to the serial jobs, so that they would end later than the parallel ones.

There was the issue, that infinity was judged being smaller than infinity and so always new jobs we backfilled. Hence I suggested to use 9999:00:00 or alike as default runtime in the scheduler configuration "default_duration" instead of the "INFINITY" in newer SGE installations.

-- Reuti


> Thanks, Erik Soyez.
> 
> 
> On Mon, 12 Jul 2010, reuti wrote:
> 
>> Am 12.07.2010 um 22:36 schrieb reuti:
>> 
>>> Am 12.07.2010 um 08:25 schrieb soyez:
>>> 
>>>> On Thu, 8 Jul 2010, reuti wrote:
>>>> 
>>>>> Am 08.07.2010 um 09:42 schrieb soyez:
>>>>> 
>>>>>> Thanks Reuti for your reply,
>>>>>> 
>>>>>> yes, max_reservation is set of course, as reservation works fine
>>>>>> with
>>>>>> parallel jobs only.
>>>>>> 
>>>>>> There are several differences between serial and parallel queue
>>>>>> (limits
>>>>>> etc.) but the main difference are sequence numbers in opposite
>>>>>> directions
>>>>>> in order to implement some kind of "fill up policy" for single cpu
>>>>>> jobs,
>>>>>> whereas parallel jobs are supposed to use different nodes first.
>>>>>> I don't
>>>>>> know of any other way to achieve this.
>>>>> 
>>>>> Fine. This is the way to go, to fill the cluster from both sides.
>>>>> 
>>>>> You mean, that slots seems to be reserved from the parallel queue,
>>>>> but
>>>>> serial job from the other queue can always slip in?
>>>> 
>>>> Yes, correct.
>>>> 
>>>>> The total amount of slots from all queues you limited by an entry in
>>>>> the exechost definition or an RQS I assume?
>>>> 
>>>> Yes, exechost definitions.
>>>> 
>>>> By the way, I forgot to mention that users have to specify a runtime
>>>> for every job.  But according to my calculations there should have
>>>> been
>>>> no backfilling for those jobs.  Do you know of any scheduling
>>>> parameter
>>>> to switch off backfilling completely, that might be worth trying.
>>> 
>>> Was it mentioned already: which version of SGE are you running?
>> 
>> Okay, okay, it's obviously late...
>> 
>>> 
>>> For me it looks working, the serial job gets in 6.2u5:
>>> 
>>> ...cannot run at host...because it offers only hc:slots=0.000000 due
>>> to a reservation
>>> 
>>> Do you use any RQS?
>>> 
>>> 
>>>>>>> Am 07.07.2010 um 18:43 schrieb reuti:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Am 07.07.2010 um 09:24 schrieb soyez:
>>>>>>>> 
>>>>>>>>> we seem to have a problem with slot reservation only working
>>>>>>>>> for jobs in the same queue.  We have one queue ("batch") for
>>>>>>>>> parallel jobs and another one ("serial") for single cpu jobs.
>>>>>>>>> 
>>>>>>>>> Large parallel jobs (>=32 slots) are submitted with "-R yes"
>>>>>>>>> and this works fine in normal circumstances when competing
>>>>>>>>> with small parallel jobs.
>>>>>>>>> 
>>>>>>>>> Right now the cluster is full with single cpu jobs and all the
>>>>>>>>> parallel jobs in queue "batch" are starving while being bypassed
>>>>>>>>> in the queue "serial".
>>>>>>>> 
>>>>>>>> is there any urgency set up for the serial queue?
>>>>>>>> 
>>>>>>>>> Is this the intended behaviour or is it just some kind of
>>>>>>>>> misconfiguration?
>>>>>>>> 
>>>>>>>> One necessary parameter is:
>>>>>>>> 
>>>>>>>> $ qconf -sconf
>>>>>>> 
>>>>>>> Ups: qconf -ssconf
>>>>>>> 
>>>>>>>> ...
>>>>>>>> max_reservation 20
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> Vorstand/Board of Management:
> Dr. Bernd Finkbeiner, Dr. Roland Niemeier, 
> Dr. Arno Steitz, Dr. Ingrid Zech
> Vorsitzender des Aufsichtsrats/
> Chairman of the Supervisory Board:
> Michel Lepert
> Sitz/Registered Office: Tuebingen
> Registergericht/Registration Court: Stuttgart
> Registernummer/Commercial Register No.: HRB 382196
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=267725
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=267788

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list