[GE users] weird "orders queue version is not uptodate" messages in qmaster log

Xavier MACHENAUD xavier.machenaud at st.com
Thu Mar 10 09:08:18 GMT 2005


I was able to progress a little bit on my problem.

Here is the context :

My goal is to avoid taking too many licenses.
I first implemented the solution described in the Flexlm HOWTO, but, 
because the complex is not consumable, this solution result in 
oversubmitting jobs using the license.
And as my licenses are supported the flexlm queue feature, I'm getting 
lots of licenses requests queued in flexlm, which is penalizing non grid 
users using these licenses.

So I though about using a consumable resource, but I wasn't able to make 
it work using the Flexlm HOWTO method.
The only way I found was to have a script deamon doing the following :
    * query flexlm to get the number of free licenses + number of 
licenses used by grid's jobs.
    * set the complex value by "doing qconf -mattr exechost 
complex_values lic_XXX=<used by grid+free>

I've got one script deamon per flexlm vendor running in parallel (3 
scripts monitoring 5 licenses).

This looked to work fine first but :
  * I'm seeing lots of "orders queue version is not uptodate" messages 
in my qmater log
  * after a while, the scheduler do not work anymore (no jobs get 
dispatched).

This is a really serious issue as my grid is not working properly.

Do you know if I'm doing something wrong or if it's a bug?

Thanks!

Xavier

Stephan Grell - Sun Germany - SSG - Software Engineer wrote:

>Xavier MACHENAUD wrote:
>
>  
>
>>Stephan,
>>
>>Just one more question : Do you know how to recover from the blocked 
>>state (when queued jobs are not started anymore)?
>>
>>    
>>
>Restart the scheduler. Could you enable the profiling in the scheduler
>and post the output for the case, that the scheduler is blocked?
>
>Thanks,
>
>Stephan
>
>  
>
>>Xavier
>>
>>Terry Lalonde wrote:
>>
>> 
>>
>>    
>>
>>>I too am seeing this with u1.  
>>>
>>>I do not plan to upgrade soon.
>>>
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: Stephan Grell - Sun Germany - SSG - Software Engineer
>>>>[mailto:stephan.grell at sun.com]
>>>>Sent: Thursday, March 03, 2005 5:03 AM
>>>>To: users at gridengine.sunsource.net
>>>>Subject: Re: [GE users] weird "orders queue version is not uptodate"
>>>>messages in qmaster log
>>>>
>>>>
>>>>
>>>>Xavier MACHENAUD wrote:
>>>>
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>>>FYI, I also got the messages in u1 and reach to point where no jobs
>>>>>    
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>were
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>scheduled anymore.
>>>>>
>>>>>I upgraded to u3 4 days ago. I still have the messages but, so far,
>>>>>didn't reach the blocking state.
>>>>>But I'me not very confident about not being blocked again :-(
>>>>>
>>>>>    
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>Well, if you are brave, you could try the maintrunk. :-)
>>>>This problem should be fixed. Would be nice to have an external test
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>for
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>it.
>>>>
>>>>Stephan
>>>>
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>>>Xavier
>>>>>
>>>>>Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
>>>>>
>>>>>
>>>>>
>>>>>    
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>Hi Terry,
>>>>>>
>>>>>>its a way to ensure, that the scheduler uses the latest data for
>>>>>>its scheduling decision. Every object has a version number. When
>>>>>>the scheduler generates updates for an object (example: job start),
>>>>>>it includes the version number into that order to inform the qmaster
>>>>>>about the basis of its decision.
>>>>>>
>>>>>>Unfortunately, there is a bug in u3, which delays the delivery of
>>>>>>      
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>update
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>>events. Therefore, the scheduler is not working on the most recent
>>>>>>data, and the qmaster logs the error messages you have noticed.
>>>>>>
>>>>>>This can cause:
>>>>>>- losing usage in the sharetree.
>>>>>>- trying to start a job twice
>>>>>>- ignoring job start orders
>>>>>>
>>>>>>The likely hod of a delay grows with the size of the share tree.
>>>>>>
>>>>>>Cheers,
>>>>>>Stephan
>>>>>>
>>>>>>Terry Lalonde wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>      
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>Interesting:  I just noticed the same thing today???
>>>>>>>
>>>>>>>02/28/2005 15:49:17|qmaster|wstlalonde|E|orders user/project
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>version
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>>>(88808) is not uptodate (88809) for user/project "ddodd"
>>>>>>>02/28/2005 15:49:32|qmaster|wstlalonde|E|orders user/project
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>version
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>>>(88808) is not uptodate (88809) for user/project "ddodd"
>>>>>>>
>>>>>>>repeated over and over.
>>>>>>>
>>>>>>>It's some accounting check I think.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>From: Xavier MACHENAUD [mailto:xavier.machenaud at st.com]
>>>>>>>>Sent: Monday, February 28, 2005 6:34 AM
>>>>>>>>To: users at gridengine.sunsource.net
>>>>>>>>Subject: [GE users] weird "orders queue version is not uptodate"
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>          
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>messages
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>in qmaster log
>>>>>>>>
>>>>>>>>Hi,
>>>>>>>>
>>>>>>>>I'm seeing these kind of messages in my qmater log :
>>>>>>>>
>>>>>>>>02/28/2005 12:30:45|qmaster|crxu71|E|orders queue version (1668)
>>>>>>>>          
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>is
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>>>>          
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>not
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>uptodate (1669) for queue "run at crxu76"
>>>>>>>>02/28/2005 12:30:51|qmaster|crxu71|E|orders queue version (1670)
>>>>>>>>          
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>is
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>>>>          
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>not
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>uptodate (1671) for queue "run at crxu76"
>>>>>>>>
>>>>>>>>Do you know what's the problem?
>>>>>>>>
>>>>>>>>Thanks,
>>>>>>>>
>>>>>>>>Xavier
>>>>>>>>
>>>>>>>>
>>>>>>>>          
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>--------------------------------------------------------------------
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>-
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>>For additional commands, e-mail:
>>>>>>>>          
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>users-help at gridengine.sunsource.net
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>>>>          
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>      
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>For additional commands, e-mail:
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>users-help at gridengine.sunsource.net
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>>>
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>---------------------------------------------------------------------
>>>>>    
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>      
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>
>>>>>    
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>>   
>>>
>>>      
>>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> 
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list