[GE users] sge queues stomping on each other

reuti reuti at staff.uni-marburg.de
Mon Oct 4 18:03:56 BST 2010


Am 01.10.2010 um 08:46 schrieb elimorris:

> I tried to set up a couple of new queues besides the default 'all.q' on my small cluster. I cloned the default queue in qmon and then made one of the queue subordinate to the other, so that jobs on the lower priority queue will be suspended for jobs in the higher priority queue when the cluster does not have enough processors to run all the jobs submitted. It's a small group and we just need a simple scheme. Here's the problem; the jobs from one queue now try to run on the same node / processors as the jobs from the other queue such that one computer node will be loaded up with 16 processors worth of jobs, even if the node only has 8 processors, while some nodes go totally unused. So, it looks like one queue isn't 'aware' of the other and they are both trying to use the same processors, instead of knowing that one queue has jobs on node X, so it will use node Y. Does anyone know how to deal with this? This is my first exposure to messing with scheduling and sge is a beast to try to understand at first. I'd appreciate any help.
> Some of the Rocks Cluster mailing list recommended I try this:
> qconf -mattr exechost complex_values slots=8 compute-0-0

if you do this, the jobs in the high priority queue may not start, when all slots are already used in the low priority queue. There is no preemption build into SGE and you are limiting the slotcount to 8 per machine. Despite being suspended, you must allow 16 slots per machine as they count as running, hence the setup looks not like being necessary.

Did you set up any sort order?

How are you testing this? By a loop across some jobs? - The actual load might not be reflected so fast in the reported np_load_avg and you need to setup job_load_adjustments.

What you might need is a slot-wise suspension, otherwise the lower priority queue instance will be suspended only completely or not at all.

Can you please post your current scheduler configuration?

-- Reuti

> for each node and I did that, but I'm still getting the same problem. 
> Thanks very much,
> Eli
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=284842
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list