[GE users] SGE-6.2u5: Slot reservation for different queues?

soyez E.Soyez at science-computing.de
Tue Jul 13 09:35:22 BST 2010


Good morning Reuti,

On Mon, 12 Jul 2010, reuti wrote:
>		:
>		:
> For me it looks working, the serial job gets in 6.2u5:
>
> ...cannot run at host...because it offers only hc:slots=0.000000 due
> to a reservation
>
> Do you use any RQS?
>		:
>		:

yes, we do use RQS, but these were far higher than the number of
running jobs.  Is there any link between RQS and reservation?  It's
good to know that it works for you, so it must be some kind of special 
(mis)configuration at our site.  I guess we will have to set up another
test cluster in order to reproduce the error in a more controlable
environment.  But again, do you know of any possibility to switch off
backfilling?

Thanks, Erik Soyez.


On Mon, 12 Jul 2010, reuti wrote:

> Am 12.07.2010 um 22:36 schrieb reuti:
>
>> Am 12.07.2010 um 08:25 schrieb soyez:
>>
>>> On Thu, 8 Jul 2010, reuti wrote:
>>>
>>>> Am 08.07.2010 um 09:42 schrieb soyez:
>>>>
>>>>> Thanks Reuti for your reply,
>>>>>
>>>>> yes, max_reservation is set of course, as reservation works fine
>>>>> with
>>>>> parallel jobs only.
>>>>>
>>>>> There are several differences between serial and parallel queue
>>>>> (limits
>>>>> etc.) but the main difference are sequence numbers in opposite
>>>>> directions
>>>>> in order to implement some kind of "fill up policy" for single cpu
>>>>> jobs,
>>>>> whereas parallel jobs are supposed to use different nodes first.
>>>>> I don't
>>>>> know of any other way to achieve this.
>>>>
>>>> Fine. This is the way to go, to fill the cluster from both sides.
>>>>
>>>> You mean, that slots seems to be reserved from the parallel queue,
>>>> but
>>>> serial job from the other queue can always slip in?
>>>
>>> Yes, correct.
>>>
>>>> The total amount of slots from all queues you limited by an entry in
>>>> the exechost definition or an RQS I assume?
>>>
>>> Yes, exechost definitions.
>>>
>>> By the way, I forgot to mention that users have to specify a runtime
>>> for every job.  But according to my calculations there should have
>>> been
>>> no backfilling for those jobs.  Do you know of any scheduling
>>> parameter
>>> to switch off backfilling completely, that might be worth trying.
>>
>> Was it mentioned already: which version of SGE are you running?
>
> Okay, okay, it's obviously late...
>
>>
>> For me it looks working, the serial job gets in 6.2u5:
>>
>> ...cannot run at host...because it offers only hc:slots=0.000000 due
>> to a reservation
>>
>> Do you use any RQS?
>>
>>
>>>>>> Am 07.07.2010 um 18:43 schrieb reuti:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Am 07.07.2010 um 09:24 schrieb soyez:
>>>>>>>
>>>>>>>> we seem to have a problem with slot reservation only working
>>>>>>>> for jobs in the same queue.  We have one queue ("batch") for
>>>>>>>> parallel jobs and another one ("serial") for single cpu jobs.
>>>>>>>>
>>>>>>>> Large parallel jobs (>=32 slots) are submitted with "-R yes"
>>>>>>>> and this works fine in normal circumstances when competing
>>>>>>>> with small parallel jobs.
>>>>>>>>
>>>>>>>> Right now the cluster is full with single cpu jobs and all the
>>>>>>>> parallel jobs in queue "batch" are starving while being bypassed
>>>>>>>> in the queue "serial".
>>>>>>>
>>>>>>> is there any urgency set up for the serial queue?
>>>>>>>
>>>>>>>> Is this the intended behaviour or is it just some kind of
>>>>>>>> misconfiguration?
>>>>>>>
>>>>>>> One necessary parameter is:
>>>>>>>
>>>>>>> $ qconf -sconf
>>>>>>
>>>>>> Ups: qconf -ssconf
>>>>>>
>>>>>>> ...
>>>>>>> max_reservation 20












-- 
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Roland Niemeier, 
Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Michel Lepert
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=267725

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list