[GE users] Slots and CPU's

templedf dan.templeton at sun.com
Fri Mar 5 20:10:14 GMT 2010


Welcome aboard!  Is this that single-slot desktop queue, or are we 
talking about one of the other queues?

In general, the way SGE tracks CPU usage is through slot usage.  The 
default assumption is that a single job consumes a single slot and hence 
a single CPU.  If a job needs to consume more than one slot (on one 
machine or across multiple) it has to be submitted as a parallel job.  
For example, "qsub -pe fea 8 job.sh" will submit a job that uses the 
'fea' parallel environment to consume 8 slots.  ("qconf -sp fea" will 
tell you about that PE.  If I remember correctly, we set that one up to 
require that all the assigned slots be on the same node.)  If you submit 
your jobs that way, SGE won't let more slots be consumed than are available.

Now, in that single-slot desktop queue, the assumption is that only one 
job will run there at a time, and the problem is that you're actually 
competing with the desktop user himself.  Because SGE has no handle on 
what the user is doing beyond the CPU and memory state, the way to keep 
SGE from competing with the user's processes is to set the 
load_threshold low enough.  That's where np_load_avg=1.0 comes it.  And 
actually, you probably want to set it even lower.  If it's an 8-core 
machine, and four cores are available to SGE, then you probably want to 
set the load_threshold to np_load_avg=0.5 so that SGE won't schedule 
anything there if the user is using more than 4 (half) of the cores.

Make sense?


On 03/05/10 11:40, murphygb wrote:
> Hello all.  Newbie here so forgive if this is obvious.  I've read and read, jumping from online manual to man pages and back again and can't seem figure out what I need to do.  Imagine I have a 16 processor machine configured with 16 job slots; np_load_avg is set to 1.  I could submit one 16 processor job or sixteen 1 processor jobs.  Let's say someone is running a 4 processor job there.  I would like the next job to only go to that machine if the next job is asking for 12 processors or less and if that next job is asking for more than 12 to pend until the resources are available.  If the next job only asks for 4 processors, then run (now the machine is using 8 processors and thus the next job will only run or pend depending on whether the user asks for 8 processors or less and so on.)  Do I use a complex like cpu or s_cpu when issuing the qsub command?  Do I need to define a consumable?  Right now, the jobs always go there (as long as the load average is under 16 since np_load_avg is 1) which is typical and that means I can have say three 12 processor jobs dispatched there all competing for only 16 processors.  Succinctly, I am looking for a 1 to 1 correlation between available processors and what the user asks for when submitting.  Any help or insight would be greatly appreciated.
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247228
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list