[GE users] node oversubscription despite slots=8

reuti reuti at staff.uni-marburg.de
Mon May 17 10:25:04 BST 2010


Am 17.05.2010 um 10:49 schrieb gmareels:

> We have 10 machines (nodes) with each 8 cores. I have set up each execution host with "complex_values slots=8".


> In addition, we have two queues (long.q and short.q) which differ in default priority and the time within which the job is to be completed). These two queues all have access to the 10 machines.
> So to prevent oversubscription of the machines, I set up the slots variable.
> When I submit a job of 8 cores to a single machine in the short queue(short at node001), and then afterwards submit another job of 8 cores to the same machine in the lonq.q, the latter jobs is correctly pending.
> However, when we submit a 4 core job to short at node001, and then submit a 8 core job in the long.q to the same machine (long at node001), then the latter job is allowed to progress. It should not be allowed to run as there are already 4 cores occupied! 


> "qstat -F" shows for long at node001
>        hc:slots=-4
> What is going wrong? How can I solve this?

Did you add the complex_value to each exechost while something was running in the system? Sometimes SGE gets out-of-sync then. But it should heal itself once the jobs are drained from the system.

You observe this on all machines in the cluster?

-- Reuti

> Many thanks,
> Guy
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257569
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list