[GE users] implementation opinion / suggestions

dougalb dougal.lists at gmail.com
Thu Jan 28 18:07:45 GMT 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

We have recently re-worked one SGE cluster due to performance issues.
We moved from classic spooling with shared Panasas storage to a single
qmaster with BDB. It is a night and day difference!! Also moved to
local spool directories and not NFS.

128 Nehalem == 1024 slots

mostly small jobs but also parallel 200 core

just under 1M jobs a month.

We are investigating a more HA solution probably around a HA service
with direct connect shared storage for BDB spool.

Also this doc is worth a read... (Thanks Dan!)

https://dct.sun.com/dct/forms/reg_us_0309_861_0.jsp

-Dougal

On Wed, Jan 27, 2010 at 8:51 PM, jching <jching at bbn.com> wrote:
> Hi,
>
> We are currently in the process of planning for our next sge implementation and wanted to get the community's opinion on local bdb -vs- rpc bdb.  The setup will be ~2000 cores (500 nodes) with a combination of short and long jobs that will run in the queue <insert approximate # of jobs here>.
>
> After reviewing some of the valuable performance data provided by Mark Dixon in a previous post, it looks like there is a significant performance gain when running local bdb -vs- rpc/bdb but the rpc/bdb option gives us an additional failover option with the shadow master.  We would love to hear any opinions and/or experience people have... we also had a few questions for the large cluster (200+ nodes) community:
>
> 1. What is your implementation? (Local or Remote BDB w/ Shadow? Type of physical hardware?  Network?)
> 2. How many nodes?
> 3. Types of jobs? (short or long period of runtime)
> 4. Any performance issues?
> 5. Do you run DRBD?
>
> Thanks in advance for any valuable feedback!
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=241351
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=241550

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list