[GE users] strange delays in gridengine commands

ruppert dieter_ruppert at siemens.com
Thu Jan 28 15:40:01 GMT 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Yes, we use Solaris 10, but I was not aware of a DTrace script (we
use SGE 6.0u6). Is this part of newer SGE versions? Might this work also
with 6.0u6?

D. Ruppert

>I assume you're not running on Solaris.  If you were, the DTrace script 
>that comes with Grid Engine would point you in the right direction.  
>Instead, you can try running the qmaster with debugging turned on and 
>redirected to a file, but that will itself cause some minor qmaster 
>performance issues.  See:
>
>http://blogs.sun.com/templedf/entry/using_debugging_output
>
>Daniel
>
>On 01/28/10 03:48, ruppert wrote:
>> Hi,
>>
>> we experience, since a few days, strange delays when executing
>> gridengine commands. For example, a simple 'qhost' or 'qstat'
>> command, which usually takes less than one second to complete,
>> takes almost one minute. The same command, issued some minutes
>> later, may complete without this delay.
>>
>> This is not load related; we have only about 60 single processor
>> execution nodes (Solaris10/Sparc), and the load on the qmaster
>> host is usually around 0.1, and this happens also when all execution
>> hosts are idle. SGE version is 6.0u6. There is nothing in the
>> various messages - files which is obviously suspicous.
>>
>> How could I proceed to further investigate this? Is there any trace
>> facility which could reveal where these commands spend their time?
>>
>>  From a simple 'truss qhost' I see that the client side transmits
>> a binary packet to the qmaster port on the qmaster host, and then
>> a long delay with "pollsys" (probably a select), before a response
>> arrives:
>>
>> ...
>> write(6, 0x1002F33A0, 99)			= 99
>>     <  m i h   v e r s i o n = " 0 . 1 ">  <  m i d>  1<  / m i d>  <
>>     d l>  4 6 1<  / d l>  <  d f>  b i n<  / d f>  <  m a t>  a c k<
>>     / m a t>  <  t a g>  2<  / t a g>  <  r i d>  0<  / r i d>  <  / m
>>     i h>
>> write(6, 0x1002F43B0, 461)			= 461
>>    \0\0\0\01002\0\0\0\0\001\0\0\00310\01001\0\0\0\0\0\0\0\0\0\0\001
>>    ...
>>    d d d d ~ * d b d , d * d * ~ * h ~ h , ~ * g n d ~ g = g d g l
>>    \0\0\0\005\0\0\0\0\0\0\0\0
>> pollsys(0xFFFFFFFF7FFF8100, 1, 0xFFFFFFFF7FFF8200, 0x00000000) (sleeping...)
>> pollsys(0xFFFFFFFF7FFF8100, 1, 0xFFFFFFFF7FFF8200, 0x00000000) = 0
>>
>> ... many pollsys ...
>>
>> pollsys(0xFFFFFFFF7FFF8100, 1, 0xFFFFFFFF7FFF8200, 0x00000000) (sleeping...)
>> pollsys(0xFFFFFFFF7FFF8100, 1, 0xFFFFFFFF7FFF8200, 0x00000000) = 0
>> pollsys(0xFFFFFFFF7FFF8100, 1, 0xFFFFFFFF7FFF8200, 0x00000000) = 0
>> pollsys(0xFFFFFFFF7FFF8100, 1, 0xFFFFFFFF7FFF8200, 0x00000000) = 1
>> read(6, 0x1002F2390, 22)			= 22
>>     <  g m s h>  <  d l>  9 7<  / d l>  <  / g m s
>> read(6, " h", 1)				= 1
>> read(6, ">", 1)				= 1
>> read(6, 0x1002F2390, 97)			= 97
>>     <  m i h   v e r s i o n = " 0 . 1 ">  <  m i d>  1<  / m i d>  <
>>     d l>  3 5<  / d l>  <  d f>  a m<  / d f>  <  m a t>  n a k<  / m
>>     a t>  <  t a g>  0<  / t a g>  <  r i d>  0<  / r i d>  <  / m i h
>>     >
>>
>> Is it possible to somehow trace the qmaster side?
>>
>> Regards
>> D. Ruppert
>>
>> ------------------------------------------------------
>> 
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=24148
0
>>
>> To unsubscribe from this discussion, e-mail: 
[users-unsubscribe at gridengine.sunsource.net].
>>
>
>------------------------------------------------------
>http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=2415
16
>
>To unsubscribe from this discussion, e-mail: 
[users-unsubscribe at gridengine.sunsource.net].

----------------------------------
ePS & RTS Automation Software GmbH
Benzstr. 1
D-71272 Renningen
Geschäftsführer: Gernot Kral, Frank Lubnau
Sitz der Gesellschaft: Renningen
Registergericht: Leonberg HRB 253220

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=241520

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list