[GE users] implementation opinion / suggestions

jlb jlb at salilab.org
Fri Jan 29 16:51:25 GMT 2010

On Wed, 27 Jan 2010 at 11:51am, jching wrote

> After reviewing some of the valuable performance data provided by Mark 
> Dixon in a previous post, it looks like there is a significant 
> performance gain when running local bdb -vs- rpc/bdb but the rpc/bdb 
> option gives us an additional failover option with the shadow master.

I'm in the midst of tests similar to what Mark did but on our production 
cluster (~550 nodes, ~3000 cores).  I hope to have some numbers next week.

> We would love to hear any opinions and/or experience people have... we 
> also had a few questions for the large cluster (200+ nodes) community:

> 1. What is your implementation? (Local or Remote BDB w/ Shadow? Type of physical hardware?  Network?)

For a while, I was using classic spooling to a NetApp filer and a shadow 
master.  The NetApp also provided user storage, however.  When, shall we 
way, over-enthusiastic users hammered it into submission, causing high 
latencies, schedd would crash (this was 6.1).  This happened often enough 
that I had to move the spool to the SGE master and get rid of the shadow 
master.  So I'm currently testing 6.2 w/ classic spooling over NFSv3 and 
BDB spooling over NFSv4, both to a dedicated NFS server.

> 3. Types of jobs? (short or long period of runtime)

We have a large mixture of jobs -- from array jobs of single-digit second 
tasks (grrr) to single jobs which almost hit our runtime limit (2 weeks).

> 4. Any performance issues?

As above, putting the spool on a remote disk (even a decently beefy 
NetApp) shared with user jobs proved problematic.

> 5. Do you run DRBD?


Joshua Baker-LePain
QB3 Shared Cluster Sysadmin


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list