[GE users] Problem with clearing a suspended status from a queue instance.

mad margaret_Doll at brown.edu
Wed Oct 20 20:07:44 BST 2010


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]



On Wed, Oct 20, 2010 at 2:42 PM, reuti <reuti at staff.uni-marburg.de<mailto:reuti at staff.uni-marburg.de>> wrote:
Am 20.10.2010 um 19:22 schrieb mad:

> On Wed, Oct 20, 2010 at 11:48 AM, reuti <reuti at staff.uni-marburg.de<mailto:reuti at staff.uni-marburg.de>> wrote:
> Am 20.10.2010 um 16:11 schrieb mad:
>
> >
> > On Wed, Oct 20, 2010 at 9:24 AM, reuti <reuti at staff.uni-marburg.de<mailto:reuti at staff.uni-marburg.de>> wrote:
> > Hi,
> >
> > Am 20.10.2010 um 14:49 schrieb mad:
> >
> > > I have tried to remove the Ss status from the queue instance by using qmon and clicking on force and resume.  That does not change the status.  I rebooted the host on which the host exists; that did not resume the queue.
> > >
> > > There are no jobs in the queue instance.
> > >
> > > This queue instance in the het queue is part of a subordinate queue het-2hr.
> > >
> > > [root at ted g03]# qmod -usq het at compute-0-31
> > > [root at ted g03]# qmod -usq het at compute-0-31.local
> > > [root at ted g03]# qmod -usq het-2hr at compute-0-31.local
> > > Queue instance "het-2hr at compute-0-31.local" is already in the specified state: unsuspended
> > > [root at ted g03]# qmod -usq het at compute-0-31.local
> > >
> > > The queue instance for het at compute-0-31.local in qmon still shows a status of Ss
> > >
> > > I am running ROCKS  5.0 , Centos 2.6.18-53.1.14.el5, Grid Engine 6.1u4
> >
> > I think an uppercase S means subordinated. Is there anything running in a superordinated queue?
> >
> > -- Reuti
> >
> > het is the queue including compute-0-30 through compute-0-33 which is subordinate to het-24hr and het-2hr.
>
> Subordinated by slot? There is an issue where the last slot won't get un-suspended again and stays in "S".

Which type of subordination are you using?

In Qmon,  I modified het-24hr.  Under tag Subordinates, I entered het as the queue with Max of 16 slots.

I wanted submission to het-24hr to get preference to jobs submitted to het.  The jobs in het would be paused until the jobs in het-24hr finished.


> > het-24hr is a queue containing compute-0-30.  There is currently a job taking all slots in this queue.
> >
> > het-2hr is a queue containing compute-0-31 and compute-0-32.  There are no jobs in this queue.
> >
> > compute-0-31 is the instance showing the Ss status.
> >
> > CLUSTER QUEUE                   CQLOAD   USED  AVAIL  TOTAL aoACDS  cdsuE
> > -------------------------------------------------------------------------------
> > het                               0.20      0     24     32      0      8
> > het-24hr                          0.78      8      0      8      0      0
> > het-2hr                           0.00      0     16     16      0      0
>
> This is strange: why is the "het" queue not having 8 slosts in "S" but in error state "E". Does:
>
> $ qstat -f
>
> show an error for this queue on certain machines?
>
> --- Reuti
>
> --
>
> qstat -f     gives me
> het at compute-0-31.local         BIP   0/8       0.00     lx26-amd64    Ss
> --
> het-2hr at compute-0-31.local     BIP   0/8       0.00     lx26-amd64      is fine.

We still have to investigate where "S" is coming from. Is it for sure not in any else queue's subordination list? As long as it's in state "S", it seems even not to be possible to apply `qmod -sq ...` or the opposite. I think this is an issue which shows up under these circumstances; I'll file an issue for it.

I know where the "S" status came from.  I was trying to get rid of the "s" status on the queue instance, so I tried resume in qmon and enable in qmon along with the force button.  Those actions appeared to do nothing.  I then click on the suspend button and got the "S" in addition to the "s".

-- Reuti


> which is what qmon is showing as well.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=288678

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].




More information about the gridengine-users mailing list