[GE users] Newbie question - queues, queue instances, and slots

craffi dag at sonsorol.org
Mon Jun 1 20:23:48 BST 2009


A few things:

  - Your priority setting has no effect SGE or the order in which jobs  
are dispatched. The param you are setting effectively is setting the  
unix nice level of your tasks - this is an OS thing that has nothing  
to do with SGE policies, resource allocation or scheduling. Most  
people don't use this parameter unless they are intentionally over  
subscribing (more running jobs than CPUs on each machine)

  - There are a few different ways to get at what you want, not sure  
if its worth going through the details if you are just experimenting  
at this point. If you can clearly explain what you want the system to  
do we can probably suggest ways to implement it

If you insist on keeping 2 cluster queues you may want to search the  
SGE docs for information on "subordinate queues" - that would let you  
set up a system by which the low priority queue stops accepting work  
when the high priority queues are occupied

You may also want to read up on SGE Resource Quotas which are a  
fantastic tool - reading the docs on that may give you some ideas for  
resource quotas that would let you simplify the queue structure. For  
example it would be trivial to create a global resource quota that  
does not let the system have more than 10 active jobs at any one time  
-- this is one way to deal with the "20 jobs / 10 processor" issue you  
have noted.

Welcome to SGE!


-Chris



On Jun 1, 2009, at 2:29 PM, jagladden wrote:

> I am new to SGE, so I am trying it out on a small test cluster as a  
> first step.  Having done some experiments, I find myself a little  
> confused about how SGE handles queue instances and slots.
>
> My test cluster has two compute nodes, with a total of 10 cores, as  
> shown by 'qhost':
>
> [root at testpe bin64]# qhost
> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE   
> SWAPTO  SWAPUS
> -------------------------------------------------------------------------------
> global                  -               -     -       -        
> -       -       -
> compute-0-0             lx26-amd64      2  0.00    2.0G  102.8M     
> 2.0G     0.0
> compute-0-1             lx26-amd64      8  0.00   15.7G  119.9M   
> 996.2M     0.0
>
> I have set up two cluster queues.  The first of these is the  
> standard default queue 'all.q' as shown by 'qconf -sq':
>
> [root at testpe ~]# qconf -sq all.q
> qname                 all.q
> hostlist              @allhosts
> seq_no                0
> load_thresholds       np_load_avg=1.75
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list             NONE
> pe_list               make mpich mpi orte
> rerun                 FALSE
> slots                 1,[compute-0-0.local=2],[compute-0-1.local=8]
> ...
>
> The second is a "high priority" queue, which is identical except for  
> having a higher default job priority:
>
> [root at testpe ~]# qconf -sq high
> qname                 high
> hostlist              @allhosts
> seq_no                0
> load_thresholds       np_load_avg=1.75
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              10
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list             NONE
> pe_list               make
> rerun                 FALSE
> slots                 1,[compute-0-0.local=2],[compute-0-1.local=8]
> ...
>
>
> My point of confusion arises when I submit jobs to both these  
> queues.  There are only 10 CPU's available, and I would expect the  
> queuing system to only allow a maximum of 10 jobs to run at any one  
> time.  What happens in practice is that SGE allows 10 jobs from each  
> of the two queues to run a the same time, for a total of 20 jobs,  
> thus effectively allocating two jobs to each CPU.  In the following  
> example I have submitted 24 jobs, 12 to each queue.  Note that  
> 'qstat' shows 20 of them to be running simultaneously, with four  
> waiting:
>
> [gladden at testpe batchtest]$ qstat
> job-ID  prior   name       user         state submit/start at      
> queue                          slots ja-task-ID
> -----------------------------------------------------------------------------------------------------------------
>     110 0.55500 test_simpl gladden      r     06/01/2009 10:08:37 all.q at compute-0-0.local 
>             1
>     114 0.55500 test_simpl gladden      r     06/01/2009 10:08:43 all.q at compute-0-0.local 
>             1
>     109 0.55500 test_simpl gladden      r     06/01/2009 10:08:37 all.q at compute-0-1.local 
>             1
>     111 0.55500 test_simpl gladden      r     06/01/2009 10:08:40 all.q at compute-0-1.local 
>             1
>     112 0.55500 test_simpl gladden      r     06/01/2009 10:08:40 all.q at compute-0-1.local 
>             1
>     113 0.55500 test_simpl gladden      r     06/01/2009 10:08:40 all.q at compute-0-1.local 
>             1
>     115 0.55500 test_simpl gladden      r     06/01/2009 10:08:43 all.q at compute-0-1.local 
>             1
>     116 0.55500 test_simpl gladden      r     06/01/2009 10:08:43 all.q at compute-0-1.local 
>             1
>     117 0.55500 test_simpl gladden      r     06/01/2009 10:08:46 all.q at compute-0-1.local 
>             1
>     118 0.55500 test_simpl gladden      r     06/01/2009 10:08:46 all.q at compute-0-1.local 
>             1
>     121 0.55500 test_simpl gladden      r     06/01/2009 10:09:08 high at compute-0-0.local 
>              1
>     126 0.55500 test_simpl gladden      r     06/01/2009 10:09:11 high at compute-0-0.local 
>              1
>     122 0.55500 test_simpl gladden      r     06/01/2009 10:09:08 high at compute-0-1.local 
>              1
>     123 0.55500 test_simpl gladden      r     06/01/2009 10:09:08 high at compute-0-1.local 
>              1
>     124 0.55500 test_simpl gladden      r     06/01/2009 10:09:08 high at compute-0-1.local 
>              1
>     125 0.55500 test_simpl gladden      r     06/01/2009 10:09:11 high at compute-0-1.local 
>              1
>     127 0.55500 test_simpl gladden      r     06/01/2009 10:09:11 high at compute-0-1.local 
>              1
>     128 0.55500 test_simpl gladden      r     06/01/2009 10:09:11 high at compute-0-1.local 
>              1
>     129 0.55500 test_simpl gladden      r     06/01/2009 10:09:14 high at compute-0-1.local 
>              1
>     130 0.55500 test_simpl gladden      r     06/01/2009 10:09:14 high at compute-0-1.local 
>              1
>     119 0.55500 test_simpl gladden      qw    06/01/2009  
> 10:08:44                                    1
>     120 0.55500 test_simpl gladden      qw    06/01/2009  
> 10:08:45                                    1
>     131 0.55500 test_simpl gladden      qw    06/01/2009  
> 10:09:12                                    1
>     132 0.55500 test_simpl gladden      qw    06/01/2009  
> 10:09:13                                    1
>
> What I had expected was that SGE would first dispatch 10 jobs from  
> the "high priority" queue and then, as those jobs completed and  
> slots become available, dispatch and run additional jobs from the  
> default queue - but allowing only 10 jobs to run at one time.   
> Instead, SGE seems to regard that 10 queue instances associated with  
> the "high" queue as being associated with slots that are independent  
> from the 10 that are associated with "all.q".
>
> Have I failed to configure something properly?  Is there not a way  
> to feed jobs from multiple queues to the same set of nodes while  
> limiting the number of active jobs to one per CPU?
>
> James Gladden

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=200194

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list