[GE users] Heads-up: Grid Engine 6.2 Beta starting soon

Andy Schwierskott andy.schwierskott at sun.com
Mon May 5 14:40:00 BST 2008


please mark your calendars: we are starting the Grid Engine 6.2 Beta on
Tuesday next week. We are asking you to reserve time and help us with your
active participation and feedback to create a bullet proof SGE 6.2 release.

The Beta is your chance to verify if your current SGE environment will work
smoothly with SGE 6.2. The Beta runs until 7/4. In the week of 6/16 we plan
for a Beta refresh which will include the upgrade procedure from 6.0/6.1.

The official announcement and download URL will be sent out next week once
we go live.

What's new in SGE 6.2
1. Service Domain Manager
2. New in SGE "core" and Accounting and Reporting Console (ARCo)
3. Moving SGE documentation set to http://wikis.sun.com

1. Service Domain Manager
    - there will be a completely new module added, extending the scope of SGE
      to a new domain of use cases: The Service Domain Manager (SDM), aka.
      project Hedeby will allow to dynamically (re-)assign computational
      resources on demand through configurable SLOs (Service Level
      Objectives), measured by KPIs (Key Performance Indicators) of the
      managed resource. Through adapters it will be possible to add virtually
      any type of service, be it for example a SGE cluster, an application
      server farm, other types of Grid middle ware applications. Manageable
      resources can be e.g. physical or virtual hosts, software licenses.

      This release of SDM will support to manage two or more SGE clusters.
      Futures releases will add more adapters, add support for GreenIT
      (manage spare pools of resources). An integration with Sun's
      virtualization stacks is planned.

2. New in SGE "core" and Accounting and Reporting console (ARCo)
    - Scalability improvements - scale up to 63000 cores
      SGE is operating Sun's HPC cluster at Texas Advanced Compute Center
      (TACC) with almost 4000 physical hosts. The improvements already partly
      in production at TACC will now be made generally available:
       - improved qmaster-execd protocol
       - scheduler running as a thread in qmaster (faster communication
         between qmaster and scheduler threads)
       - improved resource matching in scheduler
       - faster qmaster startup
       - reduced memory requirements for big clusters
       - deployment of SGE in bigger clusters
       - better resource utilization and job throughput in challenging
    - Advance Reservation (AR)
       - specified resources (e.g. machines, licenses) will be available for
         a guaranteed time window for specified users or user groups
       - predictable access to exclusive resources
    - New implementation of Interactive Job Support (IJS)
      - IJS is now natively implemented. No more need for external commands
        (rlogin, telnet, ssh)
        - Results in accurate job accounting (not possibly when ssh or qlogin
          was used before).
        - Faster interactive and parallel job start
        - Can start massively parallel jobs (1000 hosts, 16000 slots)
    - Array Job Interdependencies
        - start task N of successor array job once predecessor array job task
          N has completed. Flexible definition of interdependencies possible
        - interleaving of array job tasks increases throughput of array jobs
    - Multicluster support for Accounting and Reporting Console (ARCo)
        - collect data of multiple SGE clusters in a single (physical)
        - do queries and get reports from single Web GUI from multiple ARCO
        - use one reporting GUI to query multiple SGE reporting databases
        - no need for multiple ARCo DBs and GUIs in case SDM is used.
    - ARCo scalability improvements
        - improved dbwriter (components which writes SGE reporting data into
          the database) speed
        - improved speed of queries
        - improved ARCo usability in clusters with significant amount of
          collected reporting data
    - improved ARCo installation experience
        - the documentation on ARCo installation has been significantly
        - flatten the installation challenges
        - make it easier to deploy ARCo
    - Java Virtual Machine running as a thread in qmaster
        - A JVM thread providing a JMX server allows to contact qmaster
          natively from a Java application (the API is not yet officially
          supported). The SDM uses the new JMX interface.
    - Support for Solaris 10 startup scripts (SMF - Solaris Management
    - Support for Sun Service Tags - inventory management through
      Sun Connected Services

3. Moving SGE documentation set to http://wikis.sun.com
    - as one of the first products in Sun we are moving the *entire*
      SGE and SDM documentation set to wikis.sun.com.
    - by open sourcing many Sun products, the closed and often slow way of
      providing access to document ion to Sun customers and our communities
      is becoming less adequate. SGE being one of Sun's first bigger products
      which went open source, we are now taking the next step to modernize,
      ease and speedup the access and contribution to our documentation set.

      - enable quicker response to customer feedback (the Wiki provides a
        feedback functionality which will be sent directly to the
        documentation owners)
      - enable engineers and other product stakeholders to have direct
        contributions, updates and corrections to the documentation
      - Wiki can be opened to interested (and Sun approved) users to
        allow them to directly edit content

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list