[GE users] SGE scheduler/qmaster performance

rayson rayrayson at gmail.com
Thu Apr 22 18:10:16 BST 2010


May be you can try scheduler profiling or qping to dump the runtime
status of qmaster.

http://gridengine.info/2006/09/13/performance-profiling-information-added-to-cvs
http://wiki.gridengine.info/wiki/index.php/GridEngine_qping

Rayson



On 4/22/10, cjf001 <john.foley at motorola.com> wrote:
> SGEers:
>
> We're running SGEv6.2u2 here, and just this week I've started to
> notice (and so have the users :(  ) very slow response from the
> SGE qmaster. Most commands are very slow to respond, but the  test
> I'm using is simply running a qstat. Right now it's taking about
> 20 seconds to respond.
>
> Now, one thing that *may* have changed recently is the number of
> jobs in the system (ie, running + pending jobs). I never really
> tracked this number before, but right now, with the ~20 second
> qstat response time, we have about 7700 jobs in the system. This
> could be a lot more than we're used to, as one of our groups has
> been submitting a ton of jobs recently.
>
> So, my questions are....
>
> - does it make sense that the qmaster/scheduler response would slow
>   down with more jobs in the system ?
>
> - does anyone else run a comparable system, with this many or more
>   jobs in the system, and if so, what are your qstat times ?
>
> - if this doesn't make sense (ie, isn't normal), what should I be looking
>   for ?  The qmaster's messages file doesn't show anything abnormal.
>   The system's message file (RHELv5.2) doesn't show anything abnormal.
>   The rest of the cluster/network/etc seems to be running normally (ie,
>   doesn't appear to be any network/NIS/DNS type issues). Is there a
>   way to narrow down where the time is being spent ?
>
>      Thanks for any thoughts !
>
>              John
>
>
> FYI, "top" on the qmaster machine shows this right now....
>
>
> top - 09:24:54 up 1 day,  1:00,  5 users,  load average: 1.48, 1.44, 1.38
> Tasks: 107 total,   2 running, 105 sleeping,   0 stopped,   0 zombie
> Cpu(s): 20.3%us, 14.6%sy,  0.0%ni, 57.6%id,  0.2%wa,  0.2%hi,  7.2%si,  0.0%st
> Mem:   3954768k total,  1594728k used,  2360040k free,   179320k buffers
> Swap: 10241428k total,        0k used, 10241428k free,   545956k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  6509 sgeadm    15   0  606m 380m 3224 S   76  9.8   1212:26 sge_qmaster
>  6334 root      24   0  129m 2520 1428 S    0  0.1   0:51.74 automount
>  9159 root      18   0 22568 3032 1448 S    0  0.1   0:00.02 qstat
>  9165 root      15   0 18908 1412 1052 R    0  0.0   0:00.02 top
>     1 root      15   0 10324  760  632 S    0  0.0   0:00.42 init
>
>
>
>
>
> --
> ###########################################################################
> # John Foley                          # Location:  IL93-E1-21S            #
> # IT & Systems Administration         # Maildrop:  IL93-E1-35O            #
> # Antenna & Mechanical Simulation Grp #    Email: john.foley at motorola.com #
> # Motorola, Inc. -  Mobile Devices    #    Phone: (847) 523-8719          #
> # 600 North US Highway 45             #      Fax: (847) 523-5767          #
> # Libertyville, IL. 60048  (USA)      #     Cell: (847) 460-8719          #
> ###########################################################################
>               (this email sent using SeaMonkey on Windows)
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=254466
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=254474

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list