[GE users] high CPU load for sge_qmaster

Göran Uddeborg uddeborg at carmen.se
Fri Apr 29 17:20:26 BST 2005


Stephan Grell - Sun Germany - SSG - Software Engineer writes:
> The qping output shows, that there are 11 messages waiting to be 
> processed. These are most likely
> finished jobs or new ones. In other words, there is something to do for 
> the qmaster.

I don't know how many jobs were running at the time.  But right now,
we do not have two running jobs, one wating queued, and twelve old
jobs in an error state which hasn't been cleaned away.  There hasn't
been any job started or finished for quite some time now.  Qping says
no messages in the buffer:

    18:06 wake> qping -info wake 536 qmaster 1
    04/29/2005 18:06:42:
    SIRM version:             0.1
    SIRM message id:          1
    start time:               04/28/2005 14:52:09 (1114692729)
    run time [s]:             98073
    messages in read buffer:  0
    messages in write buffer: 0
    nr. of connected clients: 113
    status:                   0
    info:                     EDT: R (0.00) | TET: R (0.83) | MT: R (0.00) | SIGT: R (98067.91) | ok


(I've repeated the command a number of times.)  But the sge_qmaster
thread is still spinning at 100 % on one cpu!  This is most definitely
not caused by a busy system.  Not busy in the sense that a lot of jobs
are running.

Qmaster may have STARTED to spin when the system was busy, but it
doesn't calm down when the load does.

> If you want to know more about the origion of the pending messages, use 
> qping -dump.

I attach the output from this command.  I don't understand much enough
of the internal of SGE to judge what it says.  What is your
interpretation?




    [ Part 2: "Attached Text" ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list