[GE users] SGE scheduler/qmaster performance

cjf001 john.foley at motorola.com
Thu Apr 22 15:27:20 BST 2010


SGEers:

We're running SGEv6.2u2 here, and just this week I've started to
notice (and so have the users :(  ) very slow response from the
SGE qmaster. Most commands are very slow to respond, but the  test
I'm using is simply running a qstat. Right now it's taking about
20 seconds to respond.

Now, one thing that *may* have changed recently is the number of
jobs in the system (ie, running + pending jobs). I never really
tracked this number before, but right now, with the ~20 second
qstat response time, we have about 7700 jobs in the system. This
could be a lot more than we're used to, as one of our groups has
been submitting a ton of jobs recently.

So, my questions are....

- does it make sense that the qmaster/scheduler response would slow
   down with more jobs in the system ?

- does anyone else run a comparable system, with this many or more
   jobs in the system, and if so, what are your qstat times ?

- if this doesn't make sense (ie, isn't normal), what should I be looking
   for ?  The qmaster's messages file doesn't show anything abnormal.
   The system's message file (RHELv5.2) doesn't show anything abnormal.
   The rest of the cluster/network/etc seems to be running normally (ie,
   doesn't appear to be any network/NIS/DNS type issues). Is there a
   way to narrow down where the time is being spent ?

      Thanks for any thoughts !

              John


FYI, "top" on the qmaster machine shows this right now....


top - 09:24:54 up 1 day,  1:00,  5 users,  load average: 1.48, 1.44, 1.38
Tasks: 107 total,   2 running, 105 sleeping,   0 stopped,   0 zombie
Cpu(s): 20.3%us, 14.6%sy,  0.0%ni, 57.6%id,  0.2%wa,  0.2%hi,  7.2%si,  0.0%st
Mem:   3954768k total,  1594728k used,  2360040k free,   179320k buffers
Swap: 10241428k total,        0k used, 10241428k free,   545956k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  6509 sgeadm    15   0  606m 380m 3224 S   76  9.8   1212:26 sge_qmaster
  6334 root      24   0  129m 2520 1428 S    0  0.1   0:51.74 automount
  9159 root      18   0 22568 3032 1448 S    0  0.1   0:00.02 qstat
  9165 root      15   0 18908 1412 1052 R    0  0.0   0:00.02 top
     1 root      15   0 10324  760  632 S    0  0.0   0:00.42 init
  




-- 
###########################################################################
# John Foley                          # Location:  IL93-E1-21S            #
# IT & Systems Administration         # Maildrop:  IL93-E1-35O            #
# Antenna & Mechanical Simulation Grp #    Email: john.foley at motorola.com #
# Motorola, Inc. -  Mobile Devices    #    Phone: (847) 523-8719          #
# 600 North US Highway 45             #      Fax: (847) 523-5767          #
# Libertyville, IL. 60048  (USA)      #     Cell: (847) 460-8719          #
###########################################################################
               (this email sent using SeaMonkey on Windows)

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=254466

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list