[GE users] strange delays in gridengine commands

ruppert dieter_ruppert at siemens.com
Mon Feb 1 11:03:59 GMT 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

To followup on this: I more or less accidentally found the apparent
reason for this: there were a number of obsolete execution hosts
registered, which had been switched off and replaced some months
ago. This became a problem only now when the name resolution for these hosts
stopped working; this seems to have introduced these delays.

The problem disappeared when I removed these obsolete hosts from the
execution host list.

Regarding the DTrace script: this works with minor adjustments for 6.0u6;
I had to use the initial revision and remove the probes for do_c_ack; this
function is apparently not in 6.0u6.

Regards
D. Ruppert

>The DTrace script was added in 6.0u8, I believe.  There's no reason it 
>shouldn't work with 6.0u6.  Just grab the scripts and readme from here:
>
>http://gridengine.sunsource.net/source/browse/gridengine/source/scripts/dtrace/
>
>You'll probably need help interpreting the results, so just send us what 
>you get.
>
>Daniel
>
>On 01/28/10 07:40, ruppert wrote:
>> Yes, we use Solaris 10, but I was not aware of a DTrace script (we
>> use SGE 6.0u6). Is this part of newer SGE versions? Might this work also
>> with 6.0u6?
>>
>> D. Ruppert
>>
>>    
>>> I assume you're not running on Solaris.  If you were, the DTrace script
>>> that comes with Grid Engine would point you in the right direction.
>>> Instead, you can try running the qmaster with debugging turned on and
>>> redirected to a file, but that will itself cause some minor qmaster
>>> performance issues.  See:
>>>
>>> http://blogs.sun.com/templedf/entry/using_debugging_output
>>>
>>> Daniel
>>>
>>> On 01/28/10 03:48, ruppert wrote:
>>>      
>>>> Hi,
>>>>
>>>> we experience, since a few days, strange delays when executing
>>>> gridengine commands. For example, a simple 'qhost' or 'qstat'
>>>> command, which usually takes less than one second to complete,
>>>> takes almost one minute. The same command, issued some minutes
>>>> later, may complete without this delay.
>>>>
>>>> This is not load related; we have only about 60 single processor
>>>> execution nodes (Solaris10/Sparc), and the load on the qmaster
>>>> host is usually around 0.1, and this happens also when all execution
>>>> hosts are idle. SGE version is 6.0u6. There is nothing in the
>>>> various messages - files which is obviously suspicous.
>>>>
>>>> How could I proceed to further investigate this? Is there any trace
>>>> facility which could reveal where these commands spend their time?
>>>>
>>>>   From a simple 'truss qhost' I see that the client side transmits
>>>> a binary packet to the qmaster port on the qmaster host, and then
>>>> a long delay with "pollsys" (probably a select), before a response
>>>> arrives:
>>>>
>>>> ...
>>>> write(6, 0x1002F33A0, 99)			= 99
>>>>      <   m i h   v e r s i o n = " 0 . 1 ">   <   m i d>   1<   / m i d>   
<
>>>>      d l>   4 6 1<   / d l>   <   d f>   b i n<   / d f>   <   m a t>   a c 
k<
>>>>      / m a t>   <   t a g>   2<   / t a g>   <   r i d>   0<   / r i d>   < 
  / m
>>>>      i h>
>>>> write(6, 0x1002F43B0, 461)			= 461
>>>>     \0\0\0\01002\0\0\0\0\001\0\0\00310\01001\0\0\0\0\0\0\0\0\0\0\001
>>>>     ...
>>>>     d d d d ~ * d b d , d * d * ~ * h ~ h , ~ * g n d ~ g = g d g l
>>>>     \0\0\0\005\0\0\0\0\0\0\0\0
>>>> pollsys(0xFFFFFFFF7FFF8100, 1, 0xFFFFFFFF7FFF8200, 0x00000000) 
(sleeping...)
>>>> pollsys(0xFFFFFFFF7FFF8100, 1, 0xFFFFFFFF7FFF8200, 0x00000000) = 0
>>>>
>>>> ... many pollsys ...
>>>>
>>>> pollsys(0xFFFFFFFF7FFF8100, 1, 0xFFFFFFFF7FFF8200, 0x00000000) 
(sleeping...)
>>>> pollsys(0xFFFFFFFF7FFF8100, 1, 0xFFFFFFFF7FFF8200, 0x00000000) = 0
>>>> pollsys(0xFFFFFFFF7FFF8100, 1, 0xFFFFFFFF7FFF8200, 0x00000000) = 0
>>>> pollsys(0xFFFFFFFF7FFF8100, 1, 0xFFFFFFFF7FFF8200, 0x00000000) = 1
>>>> read(6, 0x1002F2390, 22)			= 22
>>>>      <   g m s h>   <   d l>   9 7<   / d l>   <   / g m s
>>>> read(6, " h", 1)				= 1
>>>> read(6, ">", 1)				= 1
>>>> read(6, 0x1002F2390, 97)			= 97
>>>>      <   m i h   v e r s i o n = " 0 . 1 ">   <   m i d>   1<   / m i d>   
<
>>>>      d l>   3 5<   / d l>   <   d f>   a m<   / d f>   <   m a t>   n a k<  
 / m
>>>>      a t>   <   t a g>   0<   / t a g>   <   r i d>   0<   / r i d>   <   / 
m i h
>>>>      >
>>>>
>>>> Is it possible to somehow trace the qmaster side?
>>>>
>>>> Regards
>>>> D. Ruppert
>>>>
>>>> ------------------------------------------------------
>>>>
>>>>        
>> 
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=24148
>> 0
>>    
>>>> To unsubscribe from this discussion, e-mail:
>>>>        
>> [users-unsubscribe at gridengine.sunsource.net].
>>    
>>>>        
>>> ------------------------------------------------------
>>> 
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=2415
>>>      
>> 16
>>    
>>> To unsubscribe from this discussion, e-mail:
>>>      
>> [users-unsubscribe at gridengine.sunsource.net].
>>
>> ----------------------------------
>> ePS&  RTS Automation Software GmbH
>> Benzstr. 1
>> D-71272 Renningen
>> GeschÃ?ftsfÃ?hrer: Gernot Kral, Frank Lubnau
>> Sitz der Gesellschaft: Renningen
>> Registergericht: Leonberg HRB 253220
>>
>> ------------------------------------------------------
>> 
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=24152
0
>>
>> To unsubscribe from this discussion, e-mail: 
[users-unsubscribe at gridengine.sunsource.net].
>>
>
>------------------------------------------------------
>http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=2415
28
>
>To unsubscribe from this discussion, e-mail: 
[users-unsubscribe at gridengine.sunsource.net].

----------------------------------
ePS & RTS Automation Software GmbH
Benzstr. 1
D-71272 Renningen
Geschäftsführer: Gernot Kral, Frank Lubnau
Sitz der Gesellschaft: Renningen
Registergericht: Leonberg HRB 253220

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=242335

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list