[GE users] subordinate queue that stays suspended

Reuti reuti at staff.uni-marburg.de
Tue May 16 15:06:37 BST 2006


Am 16.05.2006 um 15:11 schrieb Bill Knebel:

> Reuti,
>
> I thought a bit more explanation may be helpful.  I am still seeing  
> the problem.  The only way to get the nodes out of the "S" state is  
> to remove them from the queue definition for all.q then immediately  
> add them back in.  However, as soon as they go to "S" state the  
> next time we run a bootstrap.q job, they stay that way until I go  
> through the same process again.
> Any ideas?

I must admit, that I never heard of this before. What is your version  
of SGE and operating system? Maybe it's the opposite of:

http://gridengine.sunsource.net/issues/show_bug.cgi?id=1937

Anything with:

qstat -explain A

or in the messages file of the qmaster and node?

I suggest to file an issue. - Reuti


> Bill
>
> Reuti wrote:
>
>> Am 11.05.2006 um 17:31 schrieb Bill Knebel:
>>
>>> Reuti,
>>>
>>> The nodes are dual-cpu Xeons. In all.q each node (14, 15, and  
>>> 16)  is defined as having 2 slots. The same slot number is  
>>> defined in  the bootstrap.q for these nodes.  We only want one  
>>> job running per  cpu in each node at any one time. In the qconf  
>>> for the bootstrap.q  the "subordinate_list" item is all.q. That  
>>> is the extent of the  subordinate queue setup.
>>>
>>> Our goal is to suspend the use of nodes 14, 15, and 16 in all.q   
>>> when jobs are submitted to the bootstrap.q. The bootstrap.q is   
>>> configured to only use nodes 14, 15, and 16.
>>
>>
>> But this way you could get three jobs on a node. One in  
>> bootstrap.q  and two in all.q, as all.q is only suspended if both  
>> slots in  bootstrap.q are used up. Anyway, this isn't your issue.  
>> With qstat -f  and qhost -q you don't see still anything running  
>> on these machines?
>>
>> -- Reuti
>>
>>
>>> Bill
>>>
>>> Reuti wrote:
>>>
>>>> Hi,
>>>>
>>>> Am 11.05.2006 um 14:41 schrieb Bill Knebel:
>>>>
>>>>> I have three nodes out of six in all.q that are subordinate  
>>>>> to   bootstrap.q  When jobs are completed on bootstrap.q the  
>>>>> three   subordinate nodes in all.q remain in the capital "S"  
>>>>> state and  do  not accept jobs.    I have tried forcing  
>>>>> unsuspend on all.q  but  grid engine says the queue is not in  
>>>>> the suspend state.    This is a  relatively recent occurrence.   
>>>>> In the past, when jobs  completed on  bootstrap.q, the three  
>>>>> nodes on all.q that were  affected returned  to the normal  
>>>>> state and began accepting and  running jobs. Any ideas?
>>>>
>>>>
>>>>
>>>> are these dual-cpu nodes and how many slots are defined for  
>>>> each   queue and how many are used? What is the detailed setting  
>>>> for the   subordinate queue and subordination?
>>>>
>>>> -- Reuti
>>>>
>>>>> Bill
>>>>>
>>>>> -- 
>>>>> Bill Knebel, PharmD, Ph.D.
>>>>> Principal Scientist
>>>>> Metrum Research Group
>>>>> 2 Tunxis Road
>>>>> Suite 112
>>>>> Tariffville, CT 06081
>>>>> email: billk at metrumrg.com
>>>>> tel: (860) 930-1370
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> -- -
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users- 
>>>>> help at gridengine.sunsource.net
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>>
>>>>
>>>
>>> -- 
>>> Bill Knebel, PharmD, Ph.D.
>>> Principal Scientist
>>> Metrum Research Group
>>> 2 Tunxis Road
>>> Suite 112
>>> Tariffville, CT 06081
>>> email: billk at metrumrg.com
>>> tel: (860) 930-1370
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>
> -- 
> Bill Knebel, PharmD, Ph.D.
> Principal Scientist
> Metrum Research Group
> 2 Tunxis Road
> Suite 112
> Tariffville, CT 06081
> email: billk at metrumrg.com
> tel: (860) 930-1370
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list