[GE users] SGE jobs stuck in pending state

rayson rayrayson at gmail.com
Fri Jul 24 16:36:46 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

On 7/24/09, russray <rray at semtech.com> wrote:
> You might look to make sure the scheduler is running on the qmaster.  I just found the same problem and resolved it be restarting the scheduler.

He's running SGE 6.2, so the scheduler is a thread of qmaster. As he
is able to run qconf & qsub successfully, qmaster is up.

Rayson



>
>
>
>
>
> emallove <ethan.mallove at sun.com>
>
> 07/23/2009 06:04 PM
>
> Please respond to
> users <users at gridengine.sunsource.net>
>
>
> Tousers at gridengine.sunsource.net
>
> cc
>
> Subject[GE users] SGE jobs stuck in pending state
>
>
>
>
>
>
>
>
> Hello,
>
> All my jobs are getting stuck in the "pending" state, e.g.,
>
>  $ qconf -au em162155 user_lists
>  "em162155" is already in access list "user_lists"
>
>  $ qsub /tmp/hostname.sh
>  Unable to run job: warning: em162155 your job is not allowed to run in any queue
>  Your job 8 ("hostname.sh") has been submitted.
>  Exiting.
>
>  $ qconf -sql
>  all.q
>  default
>
>  $ qstat -f
>  ############################################################################
>   - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
>  ############################################################################
>        1 0.75000 hostname.s em162155     qw    07/15/2009 16:11:46     1
>        2 0.74958 hostname.s em162155     qw    07/15/2009 16:21:29     1
>        3 0.74955 hostname.s em162155     qw    07/15/2009 16:22:19     1
>        4 0.74944 hostname.s em162155     qw    07/15/2009 16:24:47     1
>        5 0.74912 hostname.s em162155     qw    07/15/2009 16:32:08     1
>        6 0.74911 hostname.s em162155     qw    07/15/2009 16:32:23     1
>        8 0.25000 hostname.s em162155     qw    07/23/2009 17:43:42     1
>
>  $ qstat -g c
>  CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE
>  --------------------------------------------------------------------------------
>  all.q                             -NA-      0      0      0      0      0      0
>  default                           -NA-      0      0      0      0      0      0
>
>  $ qstat -j 1
>  ==============================================================
>  job_number:                 1
>  ...
>  scheduling info:            All queues dropped because of overload or full
>
>  $ qstat -V |& head -1
>  GE 6.2u3
>
> Any idea how to fix this?
>
> Note: I only have the sge_execd daemon running on two hosts (the
> master + another host), because I'm trying to configure a small
> sandbox SGE configuration before scaling up to a large one. All my
> daemons are running as user "em162155".
>
> Thanks,
> Ethan
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=209256
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=209352

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list