[GE users] SGE jobs stuck in pending state

templedf dan.templeton at sun.com
Fri Jul 24 18:13:56 BST 2009


Oh, well that's the problem.  Look at the hostlist line.  It says none.
Replace it with either "@allhosts" and make sure your hosts are in the
@allhosts host group, or replace it with a list of all your hosts.

Daniel

emallove wrote:
> On Thu, Jul/23/2009 06:57:48PM, craffi wrote:
>
>> What is the output of "qconf -sq all.q" ?
>>
>
> $ qconf -sq all.q
> qname                 all.q
> hostlist              NONE
> seq_no                0
> load_thresholds       np_load_avg=1.75
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list             NONE
> pe_list               make
> rerun                 FALSE
> slots                 1,[burl-ct-v20z-11=2],[burl-ct-v20z-34=2], \
>                       [burl-ct-v20z-35=2],[burl-ct-v20z-36=2], \
>                       [burl-ct-v20z-37=2],[burl-ct-v20z-38=2], \
>                       [burl-ct-v20z-4=2],[burl-ct-v20z-39=2], \
>                       [burl-ct-v20z-40=2],[burl-ct-v20z-22=2], \
>                       [burl-ct-v20z-3=2],[burl-ct-v20z-31=2], \
>                       [burl-ct-v20z-41=2],[burl-ct-v20z-42=2], \
>                       [burl-ct-v20z-43=2],[burl-ct-v20z-44=2], \
>                       [burl-ct-v20z-45=2],[burl-ct-v20z-46=2], \
>                       [burl-ct-v20z-47=2],[burl-ct-v20z-48=2], \
>                       [burl-ct-v20z-5=2],[burl-ct-v20z-49=2], \
>                       [burl-ct-v20z-50=2],[burl-ct-v20z-51=2], \
>                       [burl-ct-v20z-53=2],[burl-ct-v20z-54=2], \
>                       [burl-ct-v20z-55=2],[burl-ct-v20z-56=2], \
>                       [burl-ct-v20z-57=2],[burl-ct-v20z-58=2], \
>                       [burl-ct-v20z-59=2],[burl-ct-v20z-60=2], \
>                       [burl-ct-v20z-61=2],[burl-ct-v20z-62=2], \
>                       [burl-ct-v20z-63=2],[burl-ct-v20z-64=2], \
>                       [burl-ct-v20z-65=2],[burl-ct-v20z-66=2], \
>                       [burl-ct-v20z-67=2],[burl-ct-v20z-7=2], \
>                       [burl-ct-v20z-68=2],[burl-ct-v20z-69=2], \
>                       [burl-ct-v20z-8=2],[burl-ct-v20z-6=2], \
>                       [burl-ct-v20z-86=2],[burl-ct-v40z-1=8], \
>                       [burl-ct-v440-1=4],[burl-ct-v440-0=4], \
>                       [burl-ct-v440-5=4],[burl-ct-v440-6=4], \
>                       [burl-ct-v40z-0=7],[burl-ct-v440-3=4], \
>                       [burl-ct-v440-7=4],[burl-ct-v440-2=4], \
>                       [burl-ct-280r-1=2],[burl-ct-280r-6=2], \
>                       [burl-ct-280r-0=2],[burl-ct-280r-2=2], \
>                       [burl-ct-280r-3=2],[burl-ct-280r-5=2], \
>                       [burl-ct-280r-4=2],[burl-ct-280r-7=2], \
>                       [burl-ct-280r-8=2],[burl-ct-v20z-101=2], \
>                       [burl-ct-v20z-72=2],[burl-ct-v20z-74=2], \
>                       [burl-ct-v20z-79=2],[burl-ct-v20z-73=2], \
>                       [burl-ct-v20z-71=2],[burl-ct-v20z-76=2], \
>                       [burl-ct-v20z-77=2],[burl-ct-v20z-70=2], \
>                       [burl-ct-v20z-75=2],[burl-ct-v20z-84=2], \
>                       [burl-ct-v20z-88=2],[burl-ct-v20z-78=2], \
>                       [burl-ct-v20z-96=2],[burl-ct-v20z-81=2], \
>                       [burl-ct-v20z-92=2],[burl-ct-v20z-94=2], \
>                       [burl-ct-v20z-90=2],[burl-ct-v20z-95=2], \
>                       [burl-ct-v20z-83=2],[burl-ct-v20z-91=2], \
>                       [burl-ct-v20z-85=2],[burl-ct-v20z-89=2], \
>                       [burl-ct-v20z-87=2],[burl-ct-v20z-82=2], \
>                       [burl-ct-v20z-93=2],[burl-ct-v20z-97=2], \
>                       [burl-ct-v20z-98=2],[burl-ct-v20z-99=2], \
>                       [burl-ct-280r-11=2],[burl-ct-280r-10=2], \
>                       [burl-ct-280r-12=2],[burl-ct-280r-13=2], \
>                       [burl-ct-280r-16=2],[burl-ct-280r-15=2], \
>                       [burl-ct-280r-14=2],[burl-ct-280r-17=2], \
>                       [burl-ct-280r-18=2],[burl-ct-280r-20=2], \
>                       [burl-ct-280r-21=2],[burl-ct-280r-24=2], \
>                       [burl-ct-280r-25=2],[burl-ct-280r-23=2], \
>                       [burl-ct-280r-26=2],[burl-ct-280r-29=2], \
>                       [burl-ct-280r-31=2],[burl-ct-280r-33=2], \
>                       [burl-ct-280r-9=2],[burl-ct-280r-19=2], \
>                       [burl-ct-280r-22=2],[burl-ct-280r-36=2], \
>                       [burl-ct-280r-37=2],[burl-ct-280r-35=2], \
>                       [burl-ct-280r-39=2],[burl-ct-280r-40=2], \
>                       [burl-ct-280r-41=2],[burl-ct-280r-38=2], \
>                       [burl-ct-280r-42=2],[burl-ct-280r-43=2], \
>                       [burl-ct-280r-45=2],[burl-ct-280r-48=2], \
>                       [burl-ct-280r-49=2],[burl-ct-280r-51=2], \
>                       [burl-ct-280r-50=2],[burl-ct-280r-53=2], \
>                       [burl-ct-280r-52=2],[burl-ct-280r-54=2], \
>                       [burl-ct-280r-55=2],[burl-ct-280r-59=2], \
>                       [burl-ct-280r-56=2],[burl-ct-280r-58=2], \
>                       [burl-ct-280r-60=2],[burl-ct-280r-57=1], \
>                       [burl-ct-280r-62=2],[burl-ct-280r-65=2], \
>                       [burl-ct-280r-63=2],[burl-ct-280r-64=2], \
>                       [burl-ct-280r-67=2],[burl-ct-280r-69=2], \
>                       [burl-ct-280r-70=2],[burl-ct-280r-68=2], \
>                       [burl-ct-280r-71=2],[burl-ct-280r-72=2], \
>                       [burl-ct-280r-73=2],[burl-ct-280r-77=2], \
>                       [burl-ct-280r-78=2],[burl-ct-280r-79=2], \
>                       [burl-ct-280r-80=2],[burl-ct-280r-81=2], \
>                       [burl-ct-280r-82=2],[burl-ct-280r-83=2], \
>                       [burl-ct-280r-84=2],[burl-ct-280r-85=2], \
>                       [burl-ct-280r-86=2],[burl-ct-280r-89=2], \
>                       [burl-ct-280r-91=2],[burl-ct-280r-90=2], \
>                       [burl-ct-280r-93=2],[burl-ct-280r-92=2], \
>                       [burl-ct-280r-94=2],[burl-ct-280r-88=2], \
>                       [burl-ct-280r-95=2],[burl-ct-280r-96=2], \
>                       [burl-ct-280r-97=2],[burl-ct-280r-99=2], \
>                       [burl-ct-280r-101=2],[burl-ct-280r-100=2], \
>                       [burl-ct-280r-98=2],[burl-ct-280r-102=2], \
>                       [burl-ct-280r-103=2],[burl-ct-280r-104=2], \
>                       [burl-ct-280r-106=2],[burl-ct-280r-105=2]
> tmpdir                /tmp
> shell                 /bin/csh
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            NONE
> user_lists            NONE
> xuser_lists           NONE
> subordinate_list      NONE
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  INFINITY
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
>
> -Ethan
>
>
>
>> -Chris
>>
>>
>> On Jul 23, 2009, at 6:02 PM, emallove wrote:
>>
>>
>>> Hello,
>>>
>>> All my jobs are getting stuck in the "pending" state, e.g.,
>>>
>>>  $ qconf -au em162155 user_lists
>>>  "em162155" is already in access list "user_lists"
>>>
>>>  $ qsub /tmp/hostname.sh
>>>  Unable to run job: warning: em162155 your job is not allowed to run
>>> in any queue
>>>  Your job 8 ("hostname.sh") has been submitted.
>>>  Exiting.
>>>
>>>  $ qconf -sql
>>>  all.q
>>>  default
>>>
>>>  $ qstat -f
>>>
>>> ############################################################################
>>>   - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS -
>>> PENDING JOBS
>>>
>>> ############################################################################
>>>        1 0.75000 hostname.s em162155     qw    07/15/2009
>>> 16:11:46     1
>>>        2 0.74958 hostname.s em162155     qw    07/15/2009
>>> 16:21:29     1
>>>        3 0.74955 hostname.s em162155     qw    07/15/2009
>>> 16:22:19     1
>>>        4 0.74944 hostname.s em162155     qw    07/15/2009
>>> 16:24:47     1
>>>        5 0.74912 hostname.s em162155     qw    07/15/2009
>>> 16:32:08     1
>>>        6 0.74911 hostname.s em162155     qw    07/15/2009
>>> 16:32:23     1
>>>        8 0.25000 hostname.s em162155     qw    07/23/2009
>>> 17:43:42     1
>>>
>>>  $ qstat -g c
>>>  CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL
>>> aoACDS  cdsuE
>>>
>>> --------------------------------------------------------------------------------
>>>  all.q                             -NA-      0      0      0
>>> 0      0      0
>>>  default                           -NA-      0      0      0
>>> 0      0      0
>>>
>>>  $ qstat -j 1
>>>  ==============================================================
>>>  job_number:                 1
>>>  ...
>>>  scheduling info:            All queues dropped because of overload
>>> or full
>>>
>>>  $ qstat -V |& head -1
>>>  GE 6.2u3
>>>
>>> Any idea how to fix this?
>>>
>>> Note: I only have the sge_execd daemon running on two hosts (the
>>> master + another host), because I'm trying to configure a small
>>> sandbox SGE configuration before scaling up to a large one. All my
>>> daemons are running as user "em162155".
>>>
>>> Thanks,
>>> Ethan
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=209256
>>>
>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
>>> ].
>>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=209264
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=209348
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=209367

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list