[GE users] Fwd: subnode with empty slots but jobs in queue

jlforrest jlforrest at berkeley.edu
Mon Dec 6 17:47:00 GMT 2010

On 12/6/2010 1:38 AM, reuti wrote:

>> I have a subnode that is currently using 7 out of its 8 slots.  I
>> have jobs waiting in the queue, but they will not start processing.
>> Everything was working fine a couple weeks ago, and then it just
>> stopped.
> the load_threshold can also be set to none, when cores = slots.
> Did you define/request any memory or other resource? Any resource
> quota set in place?
> The waiting jobs are serial ones?

I have a similar problem with SGE 6.2u4. I have a node
with 48-cores which will only run 30 jobs. Here is the
relevant output from qconf:

hostlist              @allhosts
seq_no                0
load_thresholds       NONE
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make mpi mpich orte
rerun                 FALSE
slots                 1,[compute-0-0.local=4],[compute-0-1.local=4], \
                       [compute-0-2.local=4],[compute-0-3.local=4], \
                       [compute-0-5.local=4],[compute-0-4.local=4], \
                       [compute-0-6.local=4],[compute-0-7.local=48], \

Right now compute-0-8 is down, although qstat still shows
some jobs for it. (Why would this happen?)

The qstat output for compute-0-7 shows

all.q at compute-0-7.local        BIP   0/48/48        29.05    lx26-amd64

and then it shows 48 serial jobs underneath! Yet, ssh-ing to
compute-0-7 and running ps clearly only shows 29 jobs running.

All the jobs in this cluster are serial jobs. Any idea why
I can't run 18 more jobs on compute-0-7? I restarted the
qmaster but it didn't make any difference.


Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
jlforrest at berkeley.edu


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list