[GE users] SGE jobs stuck in pending state

emallove ethan.mallove at sun.com
Fri Jul 24 16:13:53 BST 2009


On Thu, Jul/23/2009 06:57:48PM, craffi wrote:
> What is the output of "qconf -sq all.q" ?

$ qconf -sq all.q
qname                 all.q
hostlist              NONE
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make
rerun                 FALSE
slots                 1,[burl-ct-v20z-11=2],[burl-ct-v20z-34=2], \
                      [burl-ct-v20z-35=2],[burl-ct-v20z-36=2], \
                      [burl-ct-v20z-37=2],[burl-ct-v20z-38=2], \
                      [burl-ct-v20z-4=2],[burl-ct-v20z-39=2], \
                      [burl-ct-v20z-40=2],[burl-ct-v20z-22=2], \
                      [burl-ct-v20z-3=2],[burl-ct-v20z-31=2], \
                      [burl-ct-v20z-41=2],[burl-ct-v20z-42=2], \
                      [burl-ct-v20z-43=2],[burl-ct-v20z-44=2], \
                      [burl-ct-v20z-45=2],[burl-ct-v20z-46=2], \
                      [burl-ct-v20z-47=2],[burl-ct-v20z-48=2], \
                      [burl-ct-v20z-5=2],[burl-ct-v20z-49=2], \
                      [burl-ct-v20z-50=2],[burl-ct-v20z-51=2], \
                      [burl-ct-v20z-53=2],[burl-ct-v20z-54=2], \
                      [burl-ct-v20z-55=2],[burl-ct-v20z-56=2], \
                      [burl-ct-v20z-57=2],[burl-ct-v20z-58=2], \
                      [burl-ct-v20z-59=2],[burl-ct-v20z-60=2], \
                      [burl-ct-v20z-61=2],[burl-ct-v20z-62=2], \
                      [burl-ct-v20z-63=2],[burl-ct-v20z-64=2], \
                      [burl-ct-v20z-65=2],[burl-ct-v20z-66=2], \
                      [burl-ct-v20z-67=2],[burl-ct-v20z-7=2], \
                      [burl-ct-v20z-68=2],[burl-ct-v20z-69=2], \
                      [burl-ct-v20z-8=2],[burl-ct-v20z-6=2], \
                      [burl-ct-v20z-86=2],[burl-ct-v40z-1=8], \
                      [burl-ct-v440-1=4],[burl-ct-v440-0=4], \
                      [burl-ct-v440-5=4],[burl-ct-v440-6=4], \
                      [burl-ct-v40z-0=7],[burl-ct-v440-3=4], \
                      [burl-ct-v440-7=4],[burl-ct-v440-2=4], \
                      [burl-ct-280r-1=2],[burl-ct-280r-6=2], \
                      [burl-ct-280r-0=2],[burl-ct-280r-2=2], \
                      [burl-ct-280r-3=2],[burl-ct-280r-5=2], \
                      [burl-ct-280r-4=2],[burl-ct-280r-7=2], \
                      [burl-ct-280r-8=2],[burl-ct-v20z-101=2], \
                      [burl-ct-v20z-72=2],[burl-ct-v20z-74=2], \
                      [burl-ct-v20z-79=2],[burl-ct-v20z-73=2], \
                      [burl-ct-v20z-71=2],[burl-ct-v20z-76=2], \
                      [burl-ct-v20z-77=2],[burl-ct-v20z-70=2], \
                      [burl-ct-v20z-75=2],[burl-ct-v20z-84=2], \
                      [burl-ct-v20z-88=2],[burl-ct-v20z-78=2], \
                      [burl-ct-v20z-96=2],[burl-ct-v20z-81=2], \
                      [burl-ct-v20z-92=2],[burl-ct-v20z-94=2], \
                      [burl-ct-v20z-90=2],[burl-ct-v20z-95=2], \
                      [burl-ct-v20z-83=2],[burl-ct-v20z-91=2], \
                      [burl-ct-v20z-85=2],[burl-ct-v20z-89=2], \
                      [burl-ct-v20z-87=2],[burl-ct-v20z-82=2], \
                      [burl-ct-v20z-93=2],[burl-ct-v20z-97=2], \
                      [burl-ct-v20z-98=2],[burl-ct-v20z-99=2], \
                      [burl-ct-280r-11=2],[burl-ct-280r-10=2], \
                      [burl-ct-280r-12=2],[burl-ct-280r-13=2], \
                      [burl-ct-280r-16=2],[burl-ct-280r-15=2], \
                      [burl-ct-280r-14=2],[burl-ct-280r-17=2], \
                      [burl-ct-280r-18=2],[burl-ct-280r-20=2], \
                      [burl-ct-280r-21=2],[burl-ct-280r-24=2], \
                      [burl-ct-280r-25=2],[burl-ct-280r-23=2], \
                      [burl-ct-280r-26=2],[burl-ct-280r-29=2], \
                      [burl-ct-280r-31=2],[burl-ct-280r-33=2], \
                      [burl-ct-280r-9=2],[burl-ct-280r-19=2], \
                      [burl-ct-280r-22=2],[burl-ct-280r-36=2], \
                      [burl-ct-280r-37=2],[burl-ct-280r-35=2], \
                      [burl-ct-280r-39=2],[burl-ct-280r-40=2], \
                      [burl-ct-280r-41=2],[burl-ct-280r-38=2], \
                      [burl-ct-280r-42=2],[burl-ct-280r-43=2], \
                      [burl-ct-280r-45=2],[burl-ct-280r-48=2], \
                      [burl-ct-280r-49=2],[burl-ct-280r-51=2], \
                      [burl-ct-280r-50=2],[burl-ct-280r-53=2], \
                      [burl-ct-280r-52=2],[burl-ct-280r-54=2], \
                      [burl-ct-280r-55=2],[burl-ct-280r-59=2], \
                      [burl-ct-280r-56=2],[burl-ct-280r-58=2], \
                      [burl-ct-280r-60=2],[burl-ct-280r-57=1], \
                      [burl-ct-280r-62=2],[burl-ct-280r-65=2], \
                      [burl-ct-280r-63=2],[burl-ct-280r-64=2], \
                      [burl-ct-280r-67=2],[burl-ct-280r-69=2], \
                      [burl-ct-280r-70=2],[burl-ct-280r-68=2], \
                      [burl-ct-280r-71=2],[burl-ct-280r-72=2], \
                      [burl-ct-280r-73=2],[burl-ct-280r-77=2], \
                      [burl-ct-280r-78=2],[burl-ct-280r-79=2], \
                      [burl-ct-280r-80=2],[burl-ct-280r-81=2], \
                      [burl-ct-280r-82=2],[burl-ct-280r-83=2], \
                      [burl-ct-280r-84=2],[burl-ct-280r-85=2], \
                      [burl-ct-280r-86=2],[burl-ct-280r-89=2], \
                      [burl-ct-280r-91=2],[burl-ct-280r-90=2], \
                      [burl-ct-280r-93=2],[burl-ct-280r-92=2], \
                      [burl-ct-280r-94=2],[burl-ct-280r-88=2], \
                      [burl-ct-280r-95=2],[burl-ct-280r-96=2], \
                      [burl-ct-280r-97=2],[burl-ct-280r-99=2], \
                      [burl-ct-280r-101=2],[burl-ct-280r-100=2], \
                      [burl-ct-280r-98=2],[burl-ct-280r-102=2], \
                      [burl-ct-280r-103=2],[burl-ct-280r-104=2], \
                      [burl-ct-280r-106=2],[burl-ct-280r-105=2]
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY

-Ethan


> 
> -Chris
> 
> 
> On Jul 23, 2009, at 6:02 PM, emallove wrote:
> 
> > Hello,
> >
> > All my jobs are getting stuck in the "pending" state, e.g.,
> >
> >  $ qconf -au em162155 user_lists
> >  "em162155" is already in access list "user_lists"
> >
> >  $ qsub /tmp/hostname.sh
> >  Unable to run job: warning: em162155 your job is not allowed to run  
> > in any queue
> >  Your job 8 ("hostname.sh") has been submitted.
> >  Exiting.
> >
> >  $ qconf -sql
> >  all.q
> >  default
> >
> >  $ qstat -f
> >   
> > ############################################################################
> >   - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS -  
> > PENDING JOBS
> >   
> > ############################################################################
> >        1 0.75000 hostname.s em162155     qw    07/15/2009  
> > 16:11:46     1
> >        2 0.74958 hostname.s em162155     qw    07/15/2009  
> > 16:21:29     1
> >        3 0.74955 hostname.s em162155     qw    07/15/2009  
> > 16:22:19     1
> >        4 0.74944 hostname.s em162155     qw    07/15/2009  
> > 16:24:47     1
> >        5 0.74912 hostname.s em162155     qw    07/15/2009  
> > 16:32:08     1
> >        6 0.74911 hostname.s em162155     qw    07/15/2009  
> > 16:32:23     1
> >        8 0.25000 hostname.s em162155     qw    07/23/2009  
> > 17:43:42     1
> >
> >  $ qstat -g c
> >  CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL  
> > aoACDS  cdsuE
> >   
> > --------------------------------------------------------------------------------
> >  all.q                             -NA-      0      0      0       
> > 0      0      0
> >  default                           -NA-      0      0      0       
> > 0      0      0
> >
> >  $ qstat -j 1
> >  ==============================================================
> >  job_number:                 1
> >  ...
> >  scheduling info:            All queues dropped because of overload  
> > or full
> >
> >  $ qstat -V |& head -1
> >  GE 6.2u3
> >
> > Any idea how to fix this?
> >
> > Note: I only have the sge_execd daemon running on two hosts (the
> > master + another host), because I'm trying to configure a small
> > sandbox SGE configuration before scaling up to a large one. All my
> > daemons are running as user "em162155".
> >
> > Thanks,
> > Ethan
> >
> > ------------------------------------------------------
> > http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=209256
> >
> > To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> > ].
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=209264
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=209348

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list