[GE users] problems with queueing and scheduling after upgrading to 6.2u5
reuti at staff.uni-marburg.de
Tue Mar 16 12:45:27 GMT 2010
Am 16.03.2010 um 02:46 schrieb snosov:
> After upgrading from 6.1u5 to 6.2u5 we are experiencing a whole
> slew of problems with the SGE. One most notable and annoying is the
> segfaulting of the sge_shepherd. I wrote about it in a parallel
> thread and I need to paste some traces there.
> This time, however, I would like to discuss the queueing and
> scheduling problems.
> To overcome lack of per-slot preemption in 6.1.u5, we configured
> the following queues to use 4 slots per node:
well, in 6.2u5 you have slotwise suspension.
> hight_1.q -> medium_1.q -> low_1.q
> hight_2.q -> medium_2.q -> low_2.q
> hight_3.q -> medium_3.q -> low_3.q
> hight_4.q -> medium_4.q -> low_4.q
> So, for example, medium_1.q would preempt low_1.q, and hight_1.q
> would preempt both medium_1.q and low_1.q.
> High queues had hard wall-clock limit of 1 hour, medium queues had
> 3 hours and low queues were unlimited.
Okay, I get the idea. How many slots are there in each queue?
> To specify the type of queue to use, a user needed to request a
> complex "low", "medium", or "high", which could be satisfied only
> by corresponding queues.
> Also, these complexes had 1000, 2000, 3000 urgency tickets
> respectively to push higher priority jobs up in the scheduler.
> Everything worked fine with 6.1u5. After the upgrade, however, we
> see the following behaviour:
> - jobs will get assigned to a different queue despite the requested
> complex, e.g., to low_3.q despite "medium" complex being requested
> - those miss-assigned jobs will not be killed by exceeding the hard
> wall-clock limit.
> - instead of preempting the lowest priority job on one node, a
> higher priority job will be preempted on another nod, e.g.
> hight_3.q will preempt medium_3.q on node "A" rather than low_2.q
> on node "B". As the result, lowest priority jobs continue to run,
> where as medium priority jobs get suspended.
You mean a wrong queue on a completely different exechost is suspended?
> I was wondering if there were any changes in the way SGE should be
> configured that I overlooked.
> Thank you,
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users