[GE users] subordinate queue that stays suspended

Bill Knebel billk at metrumrg.com
Tue May 16 14:11:37 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti,

I thought a bit more explanation may be helpful.  I am still seeing the 
problem.  The only way to get the nodes out of the "S" state is to 
remove them from the queue definition for all.q then immediately add 
them back in.  However, as soon as they go to "S" state the next time we 
run a bootstrap.q job, they stay that way until I go through the same 
process again. 

Any ideas?

Bill

Reuti wrote:

> Am 11.05.2006 um 17:31 schrieb Bill Knebel:
>
>> Reuti,
>>
>> The nodes are dual-cpu Xeons. In all.q each node (14, 15, and 16)  is 
>> defined as having 2 slots. The same slot number is defined in  the 
>> bootstrap.q for these nodes.  We only want one job running per  cpu 
>> in each node at any one time. In the qconf for the bootstrap.q  the 
>> "subordinate_list" item is all.q. That is the extent of the  
>> subordinate queue setup.
>>
>> Our goal is to suspend the use of nodes 14, 15, and 16 in all.q  when 
>> jobs are submitted to the bootstrap.q. The bootstrap.q is  configured 
>> to only use nodes 14, 15, and 16.
>
>
> But this way you could get three jobs on a node. One in bootstrap.q  
> and two in all.q, as all.q is only suspended if both slots in  
> bootstrap.q are used up. Anyway, this isn't your issue. With qstat -f  
> and qhost -q you don't see still anything running on these machines?
>
> -- Reuti
>
>
>> Bill
>>
>> Reuti wrote:
>>
>>> Hi,
>>>
>>> Am 11.05.2006 um 14:41 schrieb Bill Knebel:
>>>
>>>> I have three nodes out of six in all.q that are subordinate to   
>>>> bootstrap.q  When jobs are completed on bootstrap.q the three   
>>>> subordinate nodes in all.q remain in the capital "S" state and  do  
>>>> not accept jobs.    I have tried forcing unsuspend on all.q  but  
>>>> grid engine says the queue is not in the suspend state.    This is 
>>>> a  relatively recent occurrence.  In the past, when jobs  completed 
>>>> on  bootstrap.q, the three nodes on all.q that were  affected 
>>>> returned  to the normal state and began accepting and  running 
>>>> jobs. Any ideas?
>>>
>>>
>>>
>>> are these dual-cpu nodes and how many slots are defined for each   
>>> queue and how many are used? What is the detailed setting for the   
>>> subordinate queue and subordination?
>>>
>>> -- Reuti
>>>
>>>> Bill
>>>>
>>>> -- 
>>>> Bill Knebel, PharmD, Ph.D.
>>>> Principal Scientist
>>>> Metrum Research Group
>>>> 2 Tunxis Road
>>>> Suite 112
>>>> Tariffville, CT 06081
>>>> email: billk at metrumrg.com
>>>> tel: (860) 930-1370
>>>>
>>>> -------------------------------------------------------------------- -
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>
>> -- 
>> Bill Knebel, PharmD, Ph.D.
>> Principal Scientist
>> Metrum Research Group
>> 2 Tunxis Road
>> Suite 112
>> Tariffville, CT 06081
>> email: billk at metrumrg.com
>> tel: (860) 930-1370
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>

-- 
Bill Knebel, PharmD, Ph.D.
Principal Scientist
Metrum Research Group
2 Tunxis Road
Suite 112
Tariffville, CT 06081
email: billk at metrumrg.com
tel: (860) 930-1370

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list