[GE users] problems with queueing and scheduling after upgrading to 6.2u5

reuti reuti at staff.uni-marburg.de
Tue Mar 16 12:45:27 GMT 2010


Hi,

Am 16.03.2010 um 02:46 schrieb snosov:

> After upgrading from 6.1u5 to 6.2u5 we are experiencing a whole  
> slew of problems with the SGE. One most notable and annoying is the  
> segfaulting of the sge_shepherd. I wrote about it in a parallel  
> thread and I need to paste some traces there.
>
> This time, however, I would like to discuss the queueing and  
> scheduling problems.
>
> To overcome lack of per-slot preemption in 6.1.u5, we configured  
> the following queues to use 4 slots per node:

well, in 6.2u5 you have slotwise suspension.


> hight_1.q -> medium_1.q -> low_1.q
> hight_2.q -> medium_2.q -> low_2.q
> hight_3.q -> medium_3.q -> low_3.q
> hight_4.q -> medium_4.q -> low_4.q
>
> So, for example, medium_1.q would preempt low_1.q, and hight_1.q  
> would preempt both medium_1.q and low_1.q.
> High queues had hard wall-clock limit of 1 hour, medium queues had  
> 3 hours and low queues were unlimited.

Okay, I get the idea. How many slots are there in each queue?


> To specify the type of queue to use, a user needed to request a  
> complex "low", "medium", or "high", which could be satisfied only  
> by corresponding queues.
> Also, these complexes had 1000, 2000, 3000 urgency tickets  
> respectively to push higher priority jobs up in the scheduler.
>
> Everything worked fine with 6.1u5. After the upgrade, however, we  
> see the following behaviour:
>
> - jobs will get assigned to a different queue despite the requested  
> complex, e.g., to low_3.q despite "medium" complex being requested
>
> - those miss-assigned jobs will not be killed by exceeding the hard  
> wall-clock limit.
>
> - instead of preempting the lowest priority job on one node, a  
> higher priority job will be preempted on another nod, e.g.  
> hight_3.q will preempt medium_3.q on node "A" rather than low_2.q  
> on node "B". As the result, lowest priority jobs continue to run,  
> where as medium priority jobs get suspended.

You mean a wrong queue on a completely different exechost is suspended?

-- Reuti


> I was wondering if there were any changes in the way SGE should be  
> configured that I overlooked.
>
> Thank you,
> Serge.
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248950

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list