[GE users] Problem with clearing a suspended status from a queue instance.

mad margaret_Doll at brown.edu
Wed Oct 20 18:22:11 BST 2010


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]



On Wed, Oct 20, 2010 at 11:48 AM, reuti <reuti at staff.uni-marburg.de<mailto:reuti at staff.uni-marburg.de>> wrote:
Am 20.10.2010 um 16:11 schrieb mad:

>
> On Wed, Oct 20, 2010 at 9:24 AM, reuti <reuti at staff.uni-marburg.de<mailto:reuti at staff.uni-marburg.de>> wrote:
> Hi,
>
> Am 20.10.2010 um 14:49 schrieb mad:
>
> > I have tried to remove the Ss status from the queue instance by using qmon and clicking on force and resume.  That does not change the status.  I rebooted the host on which the host exists; that did not resume the queue.
> >
> > There are no jobs in the queue instance.
> >
> > This queue instance in the het queue is part of a subordinate queue het-2hr.
> >
> > [root at ted g03]# qmod -usq het at compute-0-31
> > [root at ted g03]# qmod -usq het at compute-0-31.local
> > [root at ted g03]# qmod -usq het-2hr at compute-0-31.local
> > Queue instance "het-2hr at compute-0-31.local" is already in the specified state: unsuspended
> > [root at ted g03]# qmod -usq het at compute-0-31.local
> >
> > The queue instance for het at compute-0-31.local in qmon still shows a status of Ss
> >
> > I am running ROCKS  5.0 , Centos 2.6.18-53.1.14.el5, Grid Engine 6.1u4
>
> I think an uppercase S means subordinated. Is there anything running in a superordinated queue?
>
> -- Reuti
>
> het is the queue including compute-0-30 through compute-0-33 which is subordinate to het-24hr and het-2hr.

Subordinated by slot? There is an issue where the last slot won't get un-suspended again and stays in "S".


>
> het-24hr is a queue containing compute-0-30.  There is currently a job taking all slots in this queue.
>
> het-2hr is a queue containing compute-0-31 and compute-0-32.  There are no jobs in this queue.
>
> compute-0-31 is the instance showing the Ss status.
>
> CLUSTER QUEUE                   CQLOAD   USED  AVAIL  TOTAL aoACDS  cdsuE
> -------------------------------------------------------------------------------
> het                               0.20      0     24     32      0      8
> het-24hr                          0.78      8      0      8      0      0
> het-2hr                           0.00      0     16     16      0      0

This is strange: why is the "het" queue not having 8 slosts in "S" but in error state "E". Does:

$ qstat -f

show an error for this queue on certain machines?

--- Reuti

--

qstat -f     gives me
het at compute-0-31.local         BIP   0/8       0.00     lx26-amd64    Ss
--
het-2hr at compute-0-31.local     BIP   0/8       0.00     lx26-amd64      is fine.

which is what qmon is showing as well.



More information about the gridengine-users mailing list