[GE users] problems with queueing and scheduling after upgrading to 6.2u5

reuti reuti at staff.uni-marburg.de
Wed Mar 17 18:09:47 GMT 2010


Am 17.03.2010 um 18:56 schrieb reuti:

> Hi,
>
> Am 16.03.2010 um 02:46 schrieb snosov:
>
>> After upgrading from 6.1u5 to 6.2u5 we are experiencing a whole
>> slew of problems with the SGE. One most notable and annoying is the
>> segfaulting of the sge_shepherd. I wrote about it in a parallel
>> thread and I need to paste some traces there.
>>
>> This time, however, I would like to discuss the queueing and
>> scheduling problems.
>>
>> To overcome lack of per-slot preemption in 6.1.u5, we configured
>> the following queues to use 4 slots per node:
>>
>> hight_1.q -> medium_1.q -> low_1.q
>> hight_2.q -> medium_2.q -> low_2.q
>> hight_3.q -> medium_3.q -> low_3.q
>> hight_4.q -> medium_4.q -> low_4.q
>>
>> So, for example, medium_1.q would preempt low_1.q, and hight_1.q
>> would preempt both medium_1.q and low_1.q.
>> High queues had hard wall-clock limit of 1 hour, medium queues had
>> 3 hours and low queues were unlimited.
>>
>> To specify the type of queue to use, a user needed to request a
>> complex "low", "medium", or "high", which could be satisfied only
>> by corresponding queues.
>> Also, these complexes had 1000, 2000, 3000 urgency tickets
>> respectively to push higher priority jobs up in the scheduler.
>>
>> Everything worked fine with 6.1u5. After the upgrade, however, we
>> see the following behaviour:
>>
>> - jobs will get assigned to a different queue despite the requested
>> complex, e.g., to low_3.q despite "medium" complex being requested
>>
>> - those miss-assigned jobs will not be killed by exceeding the hard
>> wall-clock limit.

BTW: do you mean the hard wall-clock limit of the job or the queue?

For us the opposite is happening: jobs requesting 72 hrs end up in a  
2 hrs queue and will be killed too early.

Issue: http://gridengine.sunsource.net/issues/show_bug.cgi?id=3253

-- Reuti


> yep, we also just faced this. It looks like SGE is having a hiccup,
> as all miss-assigned jobs seem to be scheduled in one and the same
> scheduling cycle. When it's over, it will schedule fine for the next
> couple of hours.
>
> I'll file an issue pointing to this thread.
>
> -- Reuti
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=249203
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=249205

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list