[GE users] sge_shepherd eating up lots of cpu

pollinger harald.pollinger at sun.com
Fri Nov 21 11:30:24 GMT 2008


Hi Brian,

if it says it's 6.2 without update than it should by just 6.2 - no 
update. Did you install some other version?

I think you ran into Issue 2775 
(http://gridengine.sunsource.net/issues/show_bug.cgi?id=2775). It should 
be fixed with 6.2 update 1.

Regards,
Harald

brs wrote:
> Oops... Pattern seems to be old job that no longer exists in the queue.  
> Not sure how those shepherds are still hanging around.  Its no longer an 
> issue for us, but if its interesting for anyone else here, let me know 
> if you need more info.
> 
> -Brian
> 
> brs wrote:
>> Hi, all,
>>
>> I've seen, in some instances, sge_shepherd using lots of CPU time:
>>
>> Output from 'top'
>> ----
>> 10757 root      16   0 83060 2092 1676 R  162  0.0  40665:51 
>> sge_shepherd                                                                     
>>
>> 12422 root      16   0 83056 2084 1676 R  152  0.0  39756:57 
>> sge_shepherd                                                                     
>>
>>  8700 root      16   0 83052 2080 1676 R  150  0.0  40704:49 sge_shepherd
>>
>> I attached strace to one of the processes and saw lots of this:
>>
>> ----
>> strace -f -p <pid>
>> ...
>> [pid 12427] futex(0x51beed0, FUTEX_WAKE, 1) = 0
>> [pid 12427] futex(0x51be3d0, FUTEX_WAKE, 1 <unfinished ...>
>> [pid 12422] futex(0x51be3d0, FUTEX_WAIT, 2, NULL <unfinished ...>
>> [pid 12427] <... futex resumed> )       = 0
>> [pid 12422] <... futex resumed> )       = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> [pid 12427] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
>> [pid 12422] futex(0x51be3d0, FUTEX_WAKE, 1 <unfinished ...>
>> [pid 12427] <... clock_gettime resumed> {1227202263, 839309000}) = 0
>> [pid 12422] <... futex resumed> )       = 0
>> [pid 12427] futex(0x51bef34, FUTEX_WAIT, 1156052373, {0, 999985000} 
>> <unfinished ...>
>> [pid 12422] futex(0x51bef34, FUTEX_WAKE_OP, 1, 1, 0x51bef30, 
>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_EQ, 0} <unfinished ...>
>> [pid 12427] <... futex resumed> )       = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> [pid 12422] <... futex resumed> )       = 0
>> [pid 12427] futex(0x51beed0, FUTEX_WAIT, 2, NULL <unfinished ...>
>> [pid 12422] futex(0x51beed0, FUTEX_WAKE, 1 <unfinished ...>
>> [pid 12427] <... futex resumed> )       = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> [pid 12422] <... futex resumed> )       = 0
>> [pid 12427] futex(0x51beed0, FUTEX_WAKE, 1) = 0
>> [pid 12427] futex(0x51be3d0, FUTEX_WAKE, 1 <unfinished ...>
>> [pid 12422] futex(0x51be3d0, FUTEX_WAIT, 2, NULL <unfinished ...>
>> [pid 12427] <... futex resumed> )       = 0
>> [pid 12422] <... futex resumed> )       = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> [pid 12427] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
>> [pid 12422] futex(0x51be3d0, FUTEX_WAKE, 1 <unfinished ...>
>> [pid 12427] <... clock_gettime resumed> {1227202263, 839502000}) = 0
>> [pid 12422] <... futex resumed> )       = 0
>> [pid 12427] futex(0x51bef34, FUTEX_WAIT, 1156052375, {0, 999981000} 
>> <unfinished ...>
>> [pid 12422] futex(0x51bef34, FUTEX_WAKE_OP, 1, 1, 0x51bef30, 
>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_EQ, 0} <unfinished ...>
>> [pid 12427] <... futex resumed> )       = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> ...
>> Anyone have any clues for me?   I'll keep trying to diagnose here.  The 
>> nodes have 8 slots, 1/cpu.  This particular execd host was running 6 
>> SLAVE tasks for several parallel jobs.
>>
>> Im on 6.2... indeterminate update?  (seems that no longer shows up in 
>> the version strings).  I downloaded this version from Sun's download site.
>>
>> -Brian Smith


-- 
Sun Microsystems GmbH         Harald Pollinger
Dr.-Leo-Ritter-Str. 7         N1 Grid Engine Engineering
D-93049 Regensburg            Phone: +49 (0)941 3075-209  (x60209)
Germany                       Fax: +49 (0)941 3075-222  (x60222)
http://www.sun.com/gridware
mailto:harald.pollinger at sun.com
Sitz der Gesellschaft: Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89336

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list