[GE users] weird "orders queue version is not uptodate" messages in qmaster log

Xavier MACHENAUD xavier.machenaud at st.com
Fri Mar 4 16:46:05 GMT 2005


I've just reach the point where no more jobs are scheduled anymore.
But right after I switched on the jobs scheduling info (it was off), the 
jobs were scheduled again.

I guess the next time it happen, I'll be able to get some info....

It look like the problem happen every week on my grid (and always on 
Friday afternoon : just another proof of murphy's law).
I can hardly wait for a fix...

Xavier

Stephan Grell - Sun Germany - SSG - Software Engineer wrote:

>Xavier MACHENAUD wrote:
>
>  
>
>>Stephan,
>>
>>Just one more question : Do you know how to recover from the blocked 
>>state (when queued jobs are not started anymore)?
>>
>>    
>>
>Restart the scheduler. Could you enable the profiling in the scheduler
>and post the output for the case, that the scheduler is blocked?
>
>Thanks,
>
>Stephan
>
>  
>
>>Xavier
>>
>>Terry Lalonde wrote:
>>
>> 
>>
>>    
>>
>>>I too am seeing this with u1.  
>>>
>>>I do not plan to upgrade soon.
>>>
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: Stephan Grell - Sun Germany - SSG - Software Engineer
>>>>[mailto:stephan.grell at sun.com]
>>>>Sent: Thursday, March 03, 2005 5:03 AM
>>>>To: users at gridengine.sunsource.net
>>>>Subject: Re: [GE users] weird "orders queue version is not uptodate"
>>>>messages in qmaster log
>>>>
>>>>
>>>>
>>>>Xavier MACHENAUD wrote:
>>>>
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>>>FYI, I also got the messages in u1 and reach to point where no jobs
>>>>>    
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>were
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>scheduled anymore.
>>>>>
>>>>>I upgraded to u3 4 days ago. I still have the messages but, so far,
>>>>>didn't reach the blocking state.
>>>>>But I'me not very confident about not being blocked again :-(
>>>>>
>>>>>    
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>Well, if you are brave, you could try the maintrunk. :-)
>>>>This problem should be fixed. Would be nice to have an external test
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>for
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>it.
>>>>
>>>>Stephan
>>>>
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>>>Xavier
>>>>>
>>>>>Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
>>>>>
>>>>>
>>>>>
>>>>>    
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>Hi Terry,
>>>>>>
>>>>>>its a way to ensure, that the scheduler uses the latest data for
>>>>>>its scheduling decision. Every object has a version number. When
>>>>>>the scheduler generates updates for an object (example: job start),
>>>>>>it includes the version number into that order to inform the qmaster
>>>>>>about the basis of its decision.
>>>>>>
>>>>>>Unfortunately, there is a bug in u3, which delays the delivery of
>>>>>>      
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>update
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>>events. Therefore, the scheduler is not working on the most recent
>>>>>>data, and the qmaster logs the error messages you have noticed.
>>>>>>
>>>>>>This can cause:
>>>>>>- losing usage in the sharetree.
>>>>>>- trying to start a job twice
>>>>>>- ignoring job start orders
>>>>>>
>>>>>>The likely hod of a delay grows with the size of the share tree.
>>>>>>
>>>>>>Cheers,
>>>>>>Stephan
>>>>>>
>>>>>>Terry Lalonde wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>      
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>Interesting:  I just noticed the same thing today???
>>>>>>>
>>>>>>>02/28/2005 15:49:17|qmaster|wstlalonde|E|orders user/project
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>version
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>>>(88808) is not uptodate (88809) for user/project "ddodd"
>>>>>>>02/28/2005 15:49:32|qmaster|wstlalonde|E|orders user/project
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>version
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>>>(88808) is not uptodate (88809) for user/project "ddodd"
>>>>>>>
>>>>>>>repeated over and over.
>>>>>>>
>>>>>>>It's some accounting check I think.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>From: Xavier MACHENAUD [mailto:xavier.machenaud at st.com]
>>>>>>>>Sent: Monday, February 28, 2005 6:34 AM
>>>>>>>>To: users at gridengine.sunsource.net
>>>>>>>>Subject: [GE users] weird "orders queue version is not uptodate"
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>          
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>messages
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>in qmaster log
>>>>>>>>
>>>>>>>>Hi,
>>>>>>>>
>>>>>>>>I'm seeing these kind of messages in my qmater log :
>>>>>>>>
>>>>>>>>02/28/2005 12:30:45|qmaster|crxu71|E|orders queue version (1668)
>>>>>>>>          
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>is
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>>>>          
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>not
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>uptodate (1669) for queue "run at crxu76"
>>>>>>>>02/28/2005 12:30:51|qmaster|crxu71|E|orders queue version (1670)
>>>>>>>>          
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>is
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>>>>          
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>not
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>uptodate (1671) for queue "run at crxu76"
>>>>>>>>
>>>>>>>>Do you know what's the problem?
>>>>>>>>
>>>>>>>>Thanks,
>>>>>>>>
>>>>>>>>Xavier
>>>>>>>>
>>>>>>>>
>>>>>>>>          
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>--------------------------------------------------------------------
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>-
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>>For additional commands, e-mail:
>>>>>>>>          
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>users-help at gridengine.sunsource.net
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>>>>          
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>      
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>For additional commands, e-mail:
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>users-help at gridengine.sunsource.net
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>>>>
>>>>>>>        
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>---------------------------------------------------------------------
>>>>>    
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>      
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>
>>>>>    
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>>   
>>>
>>>      
>>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> 
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list