[GE users] Job stayes suspended even when queue instance is unsuspended

opoplawski orion at cora.nwra.com
Tue May 4 16:39:11 BST 2010


On 01/29/2010 08:19 AM, reuti wrote:
> Hi,
>
> Am 29.01.2010 um 08:57 schrieb murple:
>
>> Hi,
>>
>> I have a job in a subordinate queue which got suspended. Now the queue
>> instance is unsuspended again but the job stays in "S". But if I
>> try to
>> forcible unsuspend the queue and the job using "qmod -usq" and "qmod
>> -usj" it says both are already unsuspended.
>>
>> Btw. this is 6.2
>
> plain 6.2, no updates? Or do you refer to the new slotwise-
> subordination in u5 where the bug is already know:
>
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=3233

I'm seeing similar behavior in 6.2u5, but I'm not sure what would have 
suspended the queues, and there certainly is more than one job per queue 
stuck.  I am using slot wise subordination though now.


$ qstat -u \*
job-ID  prior   name       user         state submit/start at     queue 
                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
   16836 0.56000 power.tcsh ash          S     05/04/2010 08:54:02 
compute.q at font2lin.cora.nwra.c     1 224
   16836 0.56000 power.tcsh ash          S     05/04/2010 08:54:02 
compute.q at josiah.cora.nwra.com     1 231
   16836 0.56000 power.tcsh ash          S     05/04/2010 08:54:02 
compute.q at josiah.cora.nwra.com     1 251
   16836 0.56000 power.tcsh ash          S     05/04/2010 08:54:02 
compute.q at josiah.cora.nwra.com     1 253
   16836 0.56000 power.tcsh ash          S     05/04/2010 08:54:02 
compute.q at josiah.cora.nwra.com     1 258
   16836 0.56000 power.tcsh ash          S     05/04/2010 08:54:11 
compute.q at font1lin.cora.nwra.c     1 271
   16836 0.56000 power.tcsh ash          S     05/04/2010 08:54:21 
compute.q at josiah.cora.nwra.com     1 318
   16836 0.56000 power.tcsh ash          S     05/04/2010 08:54:21 
compute.q at josiah.cora.nwra.com     1 319
   16836 0.56000 power.tcsh ash          S     05/04/2010 08:54:21 
compute.q at josiah.cora.nwra.com     1 320
   16836 0.56000 power.tcsh ash          S     05/04/2010 08:54:46 
compute.q at font3lin.cora.nwra.c     1 412
   16836 0.56000 power.tcsh ash          S     05/04/2010 08:55:02 
compute.q at font2lin.cora.nwra.c     1 469
   16836 0.56000 power.tcsh ash          S     05/04/2010 08:55:15 
compute.q at font1lin.cora.nwra.c     1 510
   16836 0.56000 power.tcsh ash          S     05/04/2010 08:55:23 
compute.q at font1lin.cora.nwra.c     1 538
   16836 0.56000 power.tcsh ash          S     05/04/2010 08:55:24 
compute.q at font3lin.cora.nwra.c     1 541
   16836 0.56000 power.tcsh ash          S     05/04/2010 08:55:34 
compute.q at font3lin.cora.nwra.c     1 568
   16836 0.56000 power.tcsh ash          S     05/04/2010 08:56:34 
compute.q at font2lin.cora.nwra.c     1 724
   16838 0.56000 power.tcsh ash          S     05/04/2010 09:10:32 
compute.q at sagacompute1.cora.nw     1 531
   16838 0.56000 power.tcsh ash          S     05/04/2010 09:10:32 
compute.q at sagacompute1.cora.nw     1 532
   16838 0.56000 power.tcsh ash          S     05/04/2010 09:11:47 
compute.q at andrew.cora.nwra.com     1 854
   16838 0.56000 power.tcsh ash          S     05/04/2010 09:12:43 
compute.q at apapane.cora.nwra.co     1 1140
   16838 0.56000 power.tcsh ash          S     05/04/2010 09:13:32 
compute.q at apapane.cora.nwra.co     1 1407
   16838 0.56000 power.tcsh ash          S     05/04/2010 09:14:38 
compute.q at apapane.cora.nwra.co     1 1777

$ qstat -f -q compute.q
queuename                      qtype resv/used/tot. load_avg arch 
    states
---------------------------------------------------------------------------------
compute.q at coop00.cora.nwra.com BIPC  0/0/2          -NA-     lx26-amd64    u
---------------------------------------------------------------------------------
compute.q at coop01.cora.nwra.com BIPC  0/0/2          -NA-     lx26-amd64    u
---------------------------------------------------------------------------------
compute.q at draco.cora.nwra.com  BIPC  0/0/2          -NA-     lx26-amd64 
    du
---------------------------------------------------------------------------------
compute.q at font1lin.cora.nwra.c BIC   0/3/4          0.00     lx26-amd64
---------------------------------------------------------------------------------
compute.q at font2lin.cora.nwra.c BIC   0/3/4          0.00     lx26-amd64
---------------------------------------------------------------------------------
compute.q at font3lin.cora.nwra.c BIC   0/3/4          0.00     lx26-amd64
---------------------------------------------------------------------------------
compute.q at lyra.cora.nwra.com   BIPC  0/0/2          0.01     lx26-amd64
---------------------------------------------------------------------------------
compute.q at sagacompute1.cora.nw BIC   0/2/8          0.04     lx26-amd64
---------------------------------------------------------------------------------
compute.q at xencompute1.cora.nwr BIPC  0/0/2          0.65     lx26-amd64
---------------------------------------------------------------------------------
compute.q at amos.cora.nwra.com   BIC   0/0/8          0.08     lx26-amd64
---------------------------------------------------------------------------------
compute.q at andrew.cora.nwra.com BIC   0/1/8          0.07     lx26-amd64
---------------------------------------------------------------------------------
compute.q at apapane.cora.nwra.co BIC   0/3/4          0.01     lx26-amd64
---------------------------------------------------------------------------------
compute.q at apollo.cora.nwra.com BIPC  0/0/4          2.23     lx26-amd64
---------------------------------------------------------------------------------
compute.q at apus.cora.nwra.com   BIPC  0/0/4          4.01     lx26-amd64    a
---------------------------------------------------------------------------------
compute.q at castor.cora.nwra.com BIC   0/0/8          0.35     lx26-amd64
---------------------------------------------------------------------------------
compute.q at cetus.cora.nwra.com  BIPC  0/0/4          3.13     lx26-amd64
---------------------------------------------------------------------------------
compute.q at josiah.cora.nwra.com BIC   0/7/8          3.45     lx26-amd64
---------------------------------------------------------------------------------
compute.q at pollux.cora.nwra.com BIPC  0/0/8          3.15     lx26-amd64

Any way to force the unsuspension?


-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion at cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=256114

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list