[GE users] implementation opinion / suggestions

templedf dan.templeton at sun.com
Fri Jan 29 17:03:34 GMT 2010


Please do share your results when you have them.  I'll be very interested in your findings.

Daniel

jlb <jlb at salilab.org> wrote:

>On Wed, 27 Jan 2010 at 11:51am, jching wrote
>
>> After reviewing some of the valuable performance data provided by Mark 
>> Dixon in a previous post, it looks like there is a significant 
>> performance gain when running local bdb -vs- rpc/bdb but the rpc/bdb 
>> option gives us an additional failover option with the shadow master.
>
>I'm in the midst of tests similar to what Mark did but on our production 
>cluster (~550 nodes, ~3000 cores).  I hope to have some numbers next week.
>
>> We would love to hear any opinions and/or experience people have... we 
>> also had a few questions for the large cluster (200+ nodes) community:
>
>> 1. What is your implementation? (Local or Remote BDB w/ Shadow? Type of physical hardware?  Network?)
>
>For a while, I was using classic spooling to a NetApp filer and a shadow 
>master.  The NetApp also provided user storage, however.  When, shall we 
>way, over-enthusiastic users hammered it into submission, causing high 
>latencies, schedd would crash (this was 6.1).  This happened often enough 
>that I had to move the spool to the SGE master and get rid of the shadow 
>master.  So I'm currently testing 6.2 w/ classic spooling over NFSv3 and 
>BDB spooling over NFSv4, both to a dedicated NFS server.
>
>> 3. Types of jobs? (short or long period of runtime)
>
>We have a large mixture of jobs -- from array jobs of single-digit second 
>tasks (grrr) to single jobs which almost hit our runtime limit (2 weeks).
>
>> 4. Any performance issues?
>
>As above, putting the spool on a remote disk (even a decently beefy 
>NetApp) shared with user jobs proved problematic.
>
>> 5. Do you run DRBD?
>
>Nope.
>
>-- 
>Joshua Baker-LePain
>QB3 Shared Cluster Sysadmin
>UCSF
>
>------------------------------------------------------
>http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=241764
>
>To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=241768

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list