[GE users] implementation opinion / suggestions
jlb at salilab.org
Fri Jan 29 16:51:25 GMT 2010
On Wed, 27 Jan 2010 at 11:51am, jching wrote
> After reviewing some of the valuable performance data provided by Mark
> Dixon in a previous post, it looks like there is a significant
> performance gain when running local bdb -vs- rpc/bdb but the rpc/bdb
> option gives us an additional failover option with the shadow master.
I'm in the midst of tests similar to what Mark did but on our production
cluster (~550 nodes, ~3000 cores). I hope to have some numbers next week.
> We would love to hear any opinions and/or experience people have... we
> also had a few questions for the large cluster (200+ nodes) community:
> 1. What is your implementation? (Local or Remote BDB w/ Shadow? Type of physical hardware? Network?)
For a while, I was using classic spooling to a NetApp filer and a shadow
master. The NetApp also provided user storage, however. When, shall we
way, over-enthusiastic users hammered it into submission, causing high
latencies, schedd would crash (this was 6.1). This happened often enough
that I had to move the spool to the SGE master and get rid of the shadow
master. So I'm currently testing 6.2 w/ classic spooling over NFSv3 and
BDB spooling over NFSv4, both to a dedicated NFS server.
> 3. Types of jobs? (short or long period of runtime)
We have a large mixture of jobs -- from array jobs of single-digit second
tasks (grrr) to single jobs which almost hit our runtime limit (2 weeks).
> 4. Any performance issues?
As above, putting the spool on a remote disk (even a decently beefy
NetApp) shared with user jobs proved problematic.
> 5. Do you run DRBD?
QB3 Shared Cluster Sysadmin
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users