[GE users] queue status

Ravi Chandra Nallan Ravichandra.Nallan at Sun.COM
Mon Mar 12 16:45:32 GMT 2007


    [ The following text is in the "X-UNKNOWN" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,
 
E indicates the q is in the error state.
the qstat -explain E option should give you a hint what is wrong.

>From the qstat man page:

     The state  of  the  queue  -  one  of  u(nknown)  if  the
        corresponding  sge_execd(8) cannot be contacted, a(larm),
        A(larm),     C(alendar      suspended),      s(uspended),
        S(ubordinate),  d(isabled), D(isabled), E(rror) or combi-
        nations thereof.

     If the state is a(larm) at least on of the  load  thresholds
     defined  in the load_thresholds list of the queue configura-
     tion  (see  queue_conf(5))  is  currently  exceeded,   which
     prevents from scheduling further jobs to that queue.

     As opposed to this, the  state  A(larm)  indicates  that  at
     least  one  of  the  suspend  thresholds  of  the queue (see
     queue_conf(5)) is currently exceeded. This  will  result  in
     jobs  running  in  that  queue  being successively suspended
     until no threshold is violated.


     The states s(uspended) and d(isabled)  can  be  assigned  to
     queues  and  released  via the qmod(1) command. Suspending a
     queue will cause all jobs executing  in  that  queue  to  be
     suspended.

     The states D(isabled) and C(alendar suspended) indicate that
     the  queue  has been disabled or suspended automatically via
     the calendar facility of Grid Engine (see calendar_conf(5)),
     while  the S(ubordinate) state indicates, that the queue has
     been  suspend  via  subordination  to  another  queue   (see
     queue_conf(5) for details). When suspending a queue (regard-
     less of the cause) all jobs  executing  in  that  queue  are
     suspended too.

     If an E(rror) state is displayed for a  queue,  sge_execd(8)
     on  that  host was unable to locate the sge_shepherd(8) exe-
     cutable on that host in order to start a job.  Please  check
     the  error  logfile of that sge_execd(8) for leads on how to
     resolve the problem. Please enable the queue afterwards  via
     the -c option of the qmod(1) command manually.

     If the c(onfiguration ambiguous) state is  displayed  for  a
     queue  instance this indicates that the configuration speci-
     fied for this queue instance in  sge_conf(5)  is  ambiguous.
     This state is cleared when the configuration becomes unambi-
     guous again. This state prevents  further  jobs  from  being
     scheduled  to  that  queue  instance. Detailed reasons why a
     queue instance entered the c(onfiguration  ambiguous)  state
     can  be  found  in  the sge_qmaster(8) messages file and are
     shown by the qstat -explain switch. For queue  instances  in
     this state the cluster queue's default settings are used for
     the ambiguous attribute.

     If an o(rphaned) state is displayed for a queue instance, it
     indicates  that  the queue instance is no longer demanded by
     the current cluster queue's configuration or the host  group
     configuration.   The  queue  instance  is  kept because jobs
     which not yet finished jobs are still  associated  with  it,
     and  it  will  vanish from qstat output when these jobs have
     finished. To quicken vanishing of an orphaned queue instance
     associated  job(s)  can  be  deleted using qdel(1).  A queue
     instance in (o)rphaned state can be revived by changing  the
     cluster  queue configuration accordingly to cover that queue
     instance. This state prevents from scheduling  further  jobs
     to that queue instance.

-Ravi

On Mon, 2007-03-12 at 16:16 +0000, Colin Thomas wrote:
> Hello,
> 
>  
> 
> We have a machine that does not want to accept jobs. In the queue
> instance page of qmon, I have seen, and know what ???au??? means, but we
> are currently seeing ???E???.
> 
>  
> 
> Any ideas ?
> 
>  
> 
> Many thanks
> 
>  
> 
> Colin Thomas
> 
> 
> 
> 
> Latest Trade Press Releases from CSR:
> 08/03/2007 CSR gains UK environmental approval with ISO 14001
> 
> 22/02/2007 CSR announces strong FY 2006 results
> CSR bolsters its technology as Bluetooth reaches beyond cellular
> applications 
> 12/02/2007 CSR launches superior GPS software for seamless mobile
> positioning
> 
> 
> Latest Financial Press Releases from CSR:
> 22/02/2007 Preliminary results for the fourth quarter and 52 weeks
> ended 29 December 2006
> 
> 
> See CSR at the following events:

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list