[GE users] Initial findings with classic / BDB / RPC spooling over NFSv3 / NFSv4 on Linux
dan.templeton at sun.com
Fri Jan 8 19:42:44 GMT 2010
Really interesting info. Thanks for posting!
> I know I've only scratched the surface on this, but I thought I'd share
> some findings I've seen today when performing *simplistic* comparisons in
> job throughput on a RHEL 5.4 platform with spooling options suitable for a
> qmaster/shadowd install.
> I've not accounted for all sorts of things, but I found it interesting and
> I hope others will also.
> * Comments on Berkeley BDB spooling on a Linux NFSv4 share:
> I've not tried Kerberos authentication - only basic AUTH_SYS.
> Although it's not supported, no immediate SGE problems are noticeable,
> although I've not been seriously looking for them yet. However, I have
> noticed an interesting change vs. NFSv3 when having an HA NFSv4 server.
> If you perform a manual failover by the sequence of:
> 1) Unconfigure the IP address on server
> 2) Unexport filesystem on server
> 3) umount filesystem on server
> 4) mount filesystem on failover server
> 5) Export filesystem on failover server
> 6) Configure IP address on failover server
> A simple umount at step (3) doesn't work if a client had files
> open. Seems to eventually time-out: I need to take a closer
> look at this.
> * Throughput: Submit a 10,000 job task array consisting of a job script
> with just the "echo" command, to execute on four 16-core servers:
> Run Spool NFS Execd NFS server qmaster Time Jobs/second
> type ver spool load av. (%cpu) (sec)
> === ======= === ======= =========== ======= ===== ===========
> 1 classic 3 shared ~11 ? 1098 9
> 2 bdb 4 shared ~11 ~5% 486 21
> 3 rpc/bdb 3 shared ~11 ~3% 816 12
> 4 classic 3 local ~0.5 ~20% 427 23
> 5 bdb 4 local ~0.5 ~5% 418 24
> 6 rpc/bdb 3 local ~0.5 ~3% 625 16
> Only one result of each type was collected, so these figures are
> indicative only.
> Simple observations:
> 1) Putting execd spool directories locally stops my NFS server and
> disks from being under strain.
> 2) "qstat" and friends were very sluggish when qmaster was under strain.
> Using bdb in some form or other keeps my qmaster from being put under
> * Setup:
> NFS server : SunFire X4150 2x 2.5GHz Harpertown, 16Gb/RAM, StorageTek
> 2530 disk array. RHEL5.4 x86_64.
> qmaster : SunFire X4150 2x 2.5GHz Harpertown, 16Gb/RAM.
> RHEL5.4 x86_64.
> execd servers: SunFire X4440 4x 2.7GHz Shanghai. CentOS 5.3 x86_64.
> "classic" - classic spooling, SGE_ROOT mounted NFSv3 by qmaster and
> "bdb" - bdb spooling, SGE_ROOT mounted NFSv4 by qmaster, NFSv3 by
> "rpc/bdb" - bdb spooling, SGE_ROOT mounted NFSv3 by qmaster and execds,
> bdb spooling server running on NFS server.
> schedule_interval 0:0:1
> flush_submit_sec 0
> flush_finish_sec 0
> If you got to here, thanks for reading :)
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users