[GE users] SGE jobs stuck in pending state

templedf dan.templeton at sun.com
Fri Jul 24 18:10:21 BST 2009


You either have no queues or no hosts in your queues.  Do the following.

% qconf -sql

You should see a list of queues.  If not, that's the problem.  Fix it by 
adding a queue with qconf -aq.

Pick a queue, like all.q, which should have been installed by default.

%qconf -sq all.q | grep hostlist

You should see the word hostlist followed by something.  If that 
something is "NONE", then that's your problem.  Fix it by replacing the 
"NONE" with the names of your hosts, comma- or space-delimited, via 
qconf -mq all.q.  If you see something that looks like @allhosts, then 
that's good.  That means your queue was configured to use a host group.

% qconf -shgrp @allhosts | grep hostlist

You should see the word hostlist followed by something.  If that 
something is "NONE", then that's your problem.  Fix it by replacing the 
"NONE" with the names of your hosts, comma- or space-delimited, via 
qconf -mhgrp @allhosts.

Daniel

emallove wrote:
> On Fri, Jul/24/2009 11:26:58AM, craffi wrote:
>   
>> Does the output of "qstat -f" really not show you the state of your  
>> queues and queue instances and only shows the pending jobs?
>>     
>
> Correct. Below is the qstat output verbatim. qstat prints the same
> info from the qmaster node as from my one other non-qmaster node,
> which I assume should always be the case. Interestingly, jobs
> submitted as "root" show the same "unable to run" error, but then do
> not show up in the qstat -f output, e.g., notice job 9 does not show
> up in qstat:
>
>   $ sudo qsub /home/em162155/tmp/hostname.sh
>   Unable to run job: warning: root your job is not allowed to run in any queue
>   Your job 9 ("hostname.sh") has been submitted.
>   Exiting.
>   $ qstat -f
>
>   ############################################################################
>    - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
>   ############################################################################
>         1 0.75000 hostname.s em162155     qw    07/15/2009 16:11:46     1
>         2 0.74958 hostname.s em162155     qw    07/15/2009 16:21:29     1
>         3 0.74955 hostname.s em162155     qw    07/15/2009 16:22:19     1
>         4 0.74944 hostname.s em162155     qw    07/15/2009 16:24:47     1
>         5 0.74912 hostname.s em162155     qw    07/15/2009 16:32:08     1
>         6 0.74911 hostname.s em162155     qw    07/15/2009 16:32:23     1
>         8 0.25000 hostname.s em162155     qw    07/23/2009 17:43:42     1
>
> Now, notice job 10 *does* show up in qstat:
>
>   $ qsub /home/em162155/tmp/hostname.sh
>   Unable to run job: warning: em162155 your job is not allowed to run in any queue
>   Your job 10 ("hostname.sh") has been submitted.
>   Exiting.
>   $ qstat -f
>
>   ############################################################################
>    - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
>   ############################################################################
>         1 0.75000 hostname.s em162155     qw    07/15/2009 16:11:46     1
>         2 0.74962 hostname.s em162155     qw    07/15/2009 16:21:29     1
>         3 0.74959 hostname.s em162155     qw    07/15/2009 16:22:19     1
>         4 0.74949 hostname.s em162155     qw    07/15/2009 16:24:47     1
>         5 0.74920 hostname.s em162155     qw    07/15/2009 16:32:08     1
>         6 0.74919 hostname.s em162155     qw    07/15/2009 16:32:23     1
>         8 0.29364 hostname.s em162155     qw    07/23/2009 17:43:42     1
>        10 0.25000 hostname.s em162155     qw    07/24/2009 12:14:14     1
>
>   $ qconf |& head -1
>   GE 6.2u3
>
> -Ethan
>
>   
>> -Chris
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=209349
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>     
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=209355
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=209364

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list