[GE users] Fwd: subnode with empty slots but jobs in queue

jlforrest jlforrest at berkeley.edu
Mon Dec 6 17:47:00 GMT 2010


On 12/6/2010 1:38 AM, reuti wrote:

>> I have a subnode that is currently using 7 out of its 8 slots.  I
>> have jobs waiting in the queue, but they will not start processing.
>> Everything was working fine a couple weeks ago, and then it just
>> stopped.
>
> the load_threshold can also be set to none, when cores = slots.
>
> Did you define/request any memory or other resource? Any resource
> quota set in place?
>
> The waiting jobs are serial ones?

I have a similar problem with SGE 6.2u4. I have a node
with 48-cores which will only run 30 jobs. Here is the
relevant output from qconf:

---
hostlist              @allhosts
seq_no                0
load_thresholds       NONE
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make mpi mpich orte
rerun                 FALSE
slots                 1,[compute-0-0.local=4],[compute-0-1.local=4], \
                       [compute-0-2.local=4],[compute-0-3.local=4], \
                       [compute-0-5.local=4],[compute-0-4.local=4], \
                       [compute-0-6.local=4],[compute-0-7.local=48], \
                       [compute-0-8.local=48]
---

Right now compute-0-8 is down, although qstat still shows
some jobs for it. (Why would this happen?)

The qstat output for compute-0-7 shows

all.q at compute-0-7.local        BIP   0/48/48        29.05    lx26-amd64

and then it shows 48 serial jobs underneath! Yet, ssh-ing to
compute-0-7 and running ps clearly only shows 29 jobs running.

All the jobs in this cluster are serial jobs. Any idea why
I can't run 18 more jobs on compute-0-7? I restarted the
qmaster but it didn't make any difference.

Cordially,


-- 
Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
94720-1460
510-643-1032
jlforrest at berkeley.edu

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302517

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list