[GE users] SGE jobs stuck in pending state

russray rray at semtech.com
Fri Jul 24 16:34:02 BST 2009


You might look to make sure the scheduler is running on the qmaster.  I just found the same problem and resolved it be restarting the scheduler.




emallove <ethan.mallove at sun.com>

07/23/2009 06:04 PM
Please respond to
users <users at gridengine.sunsource.net>

To
users at gridengine.sunsource.net
cc
Subject
[GE users] SGE jobs stuck in pending state





Hello,

All my jobs are getting stuck in the "pending" state, e.g.,

 $ qconf -au em162155 user_lists
 "em162155" is already in access list "user_lists"

 $ qsub /tmp/hostname.sh
 Unable to run job: warning: em162155 your job is not allowed to run in any queue
 Your job 8 ("hostname.sh") has been submitted.
 Exiting.

 $ qconf -sql
 all.q
 default

 $ qstat -f
 ############################################################################
  - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
 ############################################################################
       1 0.75000 hostname.s em162155     qw    07/15/2009 16:11:46     1
       2 0.74958 hostname.s em162155     qw    07/15/2009 16:21:29     1
       3 0.74955 hostname.s em162155     qw    07/15/2009 16:22:19     1
       4 0.74944 hostname.s em162155     qw    07/15/2009 16:24:47     1
       5 0.74912 hostname.s em162155     qw    07/15/2009 16:32:08     1
       6 0.74911 hostname.s em162155     qw    07/15/2009 16:32:23     1
       8 0.25000 hostname.s em162155     qw    07/23/2009 17:43:42     1

 $ qstat -g c
 CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE
 --------------------------------------------------------------------------------
 all.q                             -NA-      0      0      0      0      0      0
 default                           -NA-      0      0      0      0      0      0

 $ qstat -j 1
 ==============================================================
 job_number:                 1
 ...
 scheduling info:            All queues dropped because of overload or full

 $ qstat -V |& head -1
 GE 6.2u3

Any idea how to fix this?

Note: I only have the sge_execd daemon running on two hosts (the
master + another host), because I'm trying to configure a small
sandbox SGE configuration before scaling up to a large one. All my
daemons are running as user "em162155".

Thanks,
Ethan

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=209256

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].




More information about the gridengine-users mailing list