[GE users] Initial findings with classic / BDB / RPC spooling over NFSv3 / NFSv4 on Linux

templedf dan.templeton at sun.com
Fri Jan 8 19:42:44 GMT 2010


Really interesting info.  Thanks for posting!

Daniel

ccaamad wrote:
> I know I've only scratched the surface on this, but I thought I'd share 
> some findings I've seen today when performing *simplistic* comparisons in 
> job throughput on a RHEL 5.4 platform with spooling options suitable for a 
> qmaster/shadowd install.
>
> I've not accounted for all sorts of things, but I found it interesting and 
> I hope others will also.
>
>
> * Comments on Berkeley BDB spooling on a Linux NFSv4 share:
>
>    I've not tried Kerberos authentication - only basic AUTH_SYS.
>
>    Although it's not supported, no immediate SGE problems are noticeable,
>    although I've not been seriously looking for them yet. However, I have
>    noticed an interesting change vs. NFSv3 when having an HA NFSv4 server.
>    If you perform a manual failover by the sequence of:
>
>    1) Unconfigure the IP address on server
>    2) Unexport filesystem on server
>    3) umount filesystem on server
>    4) mount filesystem on failover server
>    5) Export filesystem on failover server
>    6) Configure IP address on failover server
>
>    A simple umount at step (3) doesn't work if a client had files
>    open. Seems to eventually time-out: I need to take a closer
>    look at this.
>
>
> * Throughput: Submit a 10,000 job task array consisting of a job script
>    with just the "echo" command, to execute on four 16-core servers:
>
>    Run  Spool   NFS Execd   NFS server  qmaster Time  Jobs/second
>         type    ver spool   load av.    (%cpu)  (sec)
>    ===  ======= === ======= =========== ======= ===== ===========
>     1   classic 3   shared    ~11           ?   1098      9
>     2   bdb     4   shared    ~11          ~5%   486      21
>     3   rpc/bdb 3   shared    ~11          ~3%   816      12
>
>     4   classic 3   local     ~0.5        ~20%   427      23
>     5   bdb     4   local     ~0.5         ~5%   418      24
>     6   rpc/bdb 3   local     ~0.5         ~3%   625      16
>
>    Only one result of each type was collected, so these figures are
>    indicative only.
>
>    Simple observations:
>
>    1) Putting execd spool directories locally stops my NFS server and
>       disks from being under strain.
>
>    2) "qstat" and friends were very sluggish when qmaster was under strain.
>       Using bdb in some form or other keeps my qmaster from being put under
>       strain.
>
>
> * Setup:
>
>    NFS server   : SunFire X4150 2x 2.5GHz Harpertown, 16Gb/RAM, StorageTek
>                   2530 disk array. RHEL5.4 x86_64.
>
>    qmaster      : SunFire X4150 2x 2.5GHz Harpertown, 16Gb/RAM.
>                   RHEL5.4 x86_64.
>
>    execd servers: SunFire X4440 4x 2.7GHz Shanghai. CentOS 5.3 x86_64.
>
>    "classic" - classic spooling, SGE_ROOT mounted NFSv3 by qmaster and
>                execds
>
>    "bdb"     - bdb spooling, SGE_ROOT mounted NFSv4 by qmaster, NFSv3 by
>                execds
>
>    "rpc/bdb" - bdb spooling, SGE_ROOT mounted NFSv3 by qmaster and execds,
>                bdb spooling server running on NFS server.
>
>    scheduler:
>       schedule_interval 0:0:1
>       flush_submit_sec      0
>       flush_finish_sec      0
>
>
> If you got to here, thanks for reading :)
>
> Mark
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=237436

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list