[GE users] Minor upgrade from 6.2 to 6.2u2

andy andy.schwierskott at sun.com
Fri Mar 6 08:53:47 GMT 2009


Hi Mat,

> What is the upgrade procedure for moving from SGE 6.2 to SGE6.2u2?
>
> Do we need to follow the full upgrade procedure as defined in the
> documentation as if we were moving from 6.1 to 6.2 or is there a
> simplified process?


the spool file formats haven't changed. So running and pending jobs can
continue to stay in the system. Basically it's the usual things you have to
ensure: don't overwrite a binary of a running deamon/process.

See the long version below (taken from the patch installation instructions).
The note about parallel jobs is probably too over-cautios. If you would
rename the qrsh/rsh/rshd/qrsh_starter binaries as well I believe a running
parallel job which already has started it's parallel task would not be
affected by an upgrade.


Special Install Instructions:
-----------------------------

   Content
   -------
   Patch Installation
      Stopping the Sun Grid Engine cluster to prevent start of new jobs
      Shutting down the Sun Grid Engine daemons
      Installing the patch and restarting the software
   New functionality delivered with SGE 6.2 Update 2


   Patch Installation
   ------------------

   These installation instructions assume that you are running a homogeneous
   Sun Grid Engine cluster (called "the software") where all hosts share the
   same directory for the binaries. If you are running the software in a
   heterogeneous environment (mix of different binary architectures), you
   need to apply the patch installation for all binary architectures as well
   as the "common" and "arco" packages. See the patch matrix above for
   details about the available patches.

   If you upgrade from a previous version of Sun Grid Engine (for example
   6.0), please perform the steps described in the Sun Grid Engine
   documentation.  (http://wikis.sun.com/display/gridengine62u2/Upgrading)

   If you installed the software on local filesystems, you need to install
   all relevant patches on all hosts where you installed the software
   locally.

   By default, there should by no running jobs when the patch is installed.
   There may pending batch jobs, but no pending interactive jobs (qrsh,
   qmake, qsh, qtcsh, qlogin).

   It is possible to install the patch with running batch jobs. To avoid a
   failure of the active 'sge_shepherd' binary, it is necessary to move the
   old shepherd binary (and copy it back prior to the installation of the
   patch).

   You can not install the patch with running interactive jobs, 'qmake' jobs
   or with running parallel jobs which use the tight integration support
   (control_slaves=true in PE configuration is set).

   A. Stopping the Sun Grid Engine cluster to prevent start of new jobs
   --------------------------------------------------------------------

   Disable all queues so that no new jobs are started:

      # qmod -d '*'

   Optional (only needed if there are running jobs which should continue to
   run when the patch is installed):

      # cd $SGE_ROOT/bin
      # mv <arch>/sge_shepherd <arch>/sge_shepherd.sge62
      # cp <arch>/sge_shepherd.sge62 <arch>/sge_shepherd

   It is important that the binary is moved with the "mv" command. It should
   not be copied because this could cause the crash of an active shepherd
   process which is currently running job when the patch is installed.

   B. Shutting down the Sun Grid Engine daemons
   --------------------------------------------

   You need to shutdown (and restart) the qmaster and scheduler daemon and
   all running execution daemons.

   Shutdown all your execution hosts. Login to all your execution hosts and
   stop the execution daemons:

      # $SGE_ROOT/$SGE_CELL/common/sgeexecd softstop

   Then login to your qmaster machine and stop qmaster and scheduler:

      # $SGE_ROOT/$SGE_CELL/common/sgemaster stop

   Now verify with the 'ps' command that all Sun Grid Engine daemons on all
   hosts are stopped. If you decided to rename the 'sge_shepherd' binary so
   that running jobs can continue to run during the patch installation, you
   must not kill the 'sge_shepherd' binary (process).

   C. Installing the patch and restarting the software
   ---------------------------------------------------

   Now install the patch by installing the patch with "patchadd" or by
   unpacking the 'tar.gz' files included in this patch as outlined above.

      Restarting the software
      -----------------------

      If you have configured ARCo, you must first complete steps 1 and 2
      from the section "Stopping the Accounting and Reporting Console" from
      the ARCo patch before restarting the qmaster.

      Please login to your qmaster machine and execution hosts and enter:

         # $SGE_ROOT/$SGE_CELL/common/sgemaster
         # $SGE_ROOT/$SGE_CELL/common/sgeexecd

      After restarting the software, you may again enable your queues:

         # qmod -e '*'

      If you renamed the shepherd binary, you may safely delete the old
      binary when all jobs which where running prior the patch installation
      have finished.


Regards,
Andy

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=122031

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list