[GE users] Initial findings with classic / BDB / RPC spooling over NFSv3 / NFSv4 on Linux

craffi dag at sonsorol.org
Fri Jan 8 16:18:50 GMT 2010


Excellent read. Thanks!

-Chris


ccaamad wrote:
> I know I've only scratched the surface on this, but I thought I'd share
> some findings I've seen today when performing *simplistic* comparisons in
> job throughput on a RHEL 5.4 platform with spooling options suitable for a
> qmaster/shadowd install.
>
> I've not accounted for all sorts of things, but I found it interesting and
> I hope others will also.
>
>
> * Comments on Berkeley BDB spooling on a Linux NFSv4 share:
>
>     I've not tried Kerberos authentication - only basic AUTH_SYS.
>
>     Although it's not supported, no immediate SGE problems are noticeable,
>     although I've not been seriously looking for them yet. However, I have
>     noticed an interesting change vs. NFSv3 when having an HA NFSv4 server.
>     If you perform a manual failover by the sequence of:
>
>     1) Unconfigure the IP address on server
>     2) Unexport filesystem on server
>     3) umount filesystem on server
>     4) mount filesystem on failover server
>     5) Export filesystem on failover server
>     6) Configure IP address on failover server
>
>     A simple umount at step (3) doesn't work if a client had files
>     open. Seems to eventually time-out: I need to take a closer
>     look at this.
>
>
> * Throughput: Submit a 10,000 job task array consisting of a job script
>     with just the "echo" command, to execute on four 16-core servers:
>
>     Run  Spool   NFS Execd   NFS server  qmaster Time  Jobs/second
>          type    ver spool   load av.    (%cpu)  (sec)
>     ===  ======= === ======= =========== ======= ===== ===========
>      1   classic 3   shared    ~11           ?   1098      9
>      2   bdb     4   shared    ~11          ~5%   486      21
>      3   rpc/bdb 3   shared    ~11          ~3%   816      12
>
>      4   classic 3   local     ~0.5        ~20%   427      23
>      5   bdb     4   local     ~0.5         ~5%   418      24
>      6   rpc/bdb 3   local     ~0.5         ~3%   625      16
>
>     Only one result of each type was collected, so these figures are
>     indicative only.
>
>     Simple observations:
>
>     1) Putting execd spool directories locally stops my NFS server and
>        disks from being under strain.
>
>     2) "qstat" and friends were very sluggish when qmaster was under strain.
>        Using bdb in some form or other keeps my qmaster from being put under
>        strain.
>
>
> * Setup:
>
>     NFS server   : SunFire X4150 2x 2.5GHz Harpertown, 16Gb/RAM, StorageTek
>                    2530 disk array. RHEL5.4 x86_64.
>
>     qmaster      : SunFire X4150 2x 2.5GHz Harpertown, 16Gb/RAM.
>                    RHEL5.4 x86_64.
>
>     execd servers: SunFire X4440 4x 2.7GHz Shanghai. CentOS 5.3 x86_64.
>
>     "classic" - classic spooling, SGE_ROOT mounted NFSv3 by qmaster and
>                 execds
>
>     "bdb"     - bdb spooling, SGE_ROOT mounted NFSv4 by qmaster, NFSv3 by
>                 execds
>
>     "rpc/bdb" - bdb spooling, SGE_ROOT mounted NFSv3 by qmaster and execds,
>                 bdb spooling server running on NFS server.
>
>     scheduler:
>        schedule_interval 0:0:1
>        flush_submit_sec      0
>        flush_finish_sec      0
>
>
> If you got to here, thanks for reading :)
>
> Mark

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=237408

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list