[GE users] Minor upgrade from 6.2 to 6.2u2

matbradford matthew.bradford at eds.com
Fri Mar 6 12:49:46 GMT 2009


I've run through using the standard inst_sge -upd process, using a saved
configuration,  and it all works OK.

Cheers,

Mat

>-----Original Message-----
>From: matbradford [mailto:matthew.bradford at eds.com]
>Sent: 06 March 2009 12:42
>To: users at gridengine.sunsource.net
>Subject: RE: [GE users] Minor upgrade from 6.2 to 6.2u2
>
>Andy,
>
>I'm possibly doing something stupid...
>
>I've shutdown the cluster, all daemons etc.
>Added the patches using the tar.gz method for lx24-amd64 and common and
>when I attempt to restart the sgemaster, it fails and I get the
>following message in the messages file:
>
><date> main|<host>|ClSetUlong: wrong type for field CE_Consumable
>(lBoolT)
>
>I'm running on Suse Enterprise 10 on Xeon.
>
>Any ideas?
>
>Cheers,
>
>Mat
>
>>-----Original Message-----
>>From: andy [mailto:andy.schwierskott at sun.com]
>>Sent: 06 March 2009 08:54
>>To: users at gridengine.sunsource.net
>>Subject: Re: [GE users] Minor upgrade from 6.2 to 6.2u2
>>
>>Hi Mat,
>>
>>> What is the upgrade procedure for moving from SGE 6.2 to SGE6.2u2?
>>>
>>> Do we need to follow the full upgrade procedure as defined in the
>>> documentation as if we were moving from 6.1 to 6.2 or is there a
>>> simplified process?
>>
>>
>>the spool file formats haven't changed. So running and pending jobs
can
>>continue to stay in the system. Basically it's the usual things you
>have
>>to
>>ensure: don't overwrite a binary of a running deamon/process.
>>
>>See the long version below (taken from the patch installation
>>instructions).
>>The note about parallel jobs is probably too over-cautios. If you
would
>>rename the qrsh/rsh/rshd/qrsh_starter binaries as well I believe a
>>running
>>parallel job which already has started it's parallel task would not be
>>affected by an upgrade.
>>
>>
>>Special Install Instructions:
>>-----------------------------
>>
>>   Content
>>   -------
>>   Patch Installation
>>      Stopping the Sun Grid Engine cluster to prevent start of new
jobs
>>      Shutting down the Sun Grid Engine daemons
>>      Installing the patch and restarting the software
>>   New functionality delivered with SGE 6.2 Update 2
>>
>>
>>   Patch Installation
>>   ------------------
>>
>>   These installation instructions assume that you are running a
>>homogeneous
>>   Sun Grid Engine cluster (called "the software") where all hosts
>share
>>the
>>   same directory for the binaries. If you are running the software in
>a
>>   heterogeneous environment (mix of different binary architectures),
>>you
>>   need to apply the patch installation for all binary architectures
as
>>well
>>   as the "common" and "arco" packages. See the patch matrix above for
>>   details about the available patches.
>>
>>   If you upgrade from a previous version of Sun Grid Engine (for
>>example
>>   6.0), please perform the steps described in the Sun Grid Engine
>>   documentation.
>>(http://wikis.sun.com/display/gridengine62u2/Upgrading)
>>
>>   If you installed the software on local filesystems, you need to
>>install
>>   all relevant patches on all hosts where you installed the software
>>   locally.
>>
>>   By default, there should by no running jobs when the patch is
>>installed.
>>   There may pending batch jobs, but no pending interactive jobs
(qrsh,
>>   qmake, qsh, qtcsh, qlogin).
>>
>>   It is possible to install the patch with running batch jobs. To
>avoid
>>a
>>   failure of the active 'sge_shepherd' binary, it is necessary to
move
>>the
>>   old shepherd binary (and copy it back prior to the installation of
>>the
>>   patch).
>>
>>   You can not install the patch with running interactive jobs,
'qmake'
>>jobs
>>   or with running parallel jobs which use the tight integration
>support
>>   (control_slaves=true in PE configuration is set).
>>
>>   A. Stopping the Sun Grid Engine cluster to prevent start of new
jobs
>>
--------------------------------------------------------------------
>>
>>   Disable all queues so that no new jobs are started:
>>
>>      # qmod -d '*'
>>
>>   Optional (only needed if there are running jobs which should
>continue
>>to
>>   run when the patch is installed):
>>
>>      # cd $SGE_ROOT/bin
>>      # mv <arch>/sge_shepherd <arch>/sge_shepherd.sge62
>>      # cp <arch>/sge_shepherd.sge62 <arch>/sge_shepherd
>>
>>   It is important that the binary is moved with the "mv" command. It
>>should
>>   not be copied because this could cause the crash of an active
>>shepherd
>>   process which is currently running job when the patch is installed.
>>
>>   B. Shutting down the Sun Grid Engine daemons
>>   --------------------------------------------
>>
>>   You need to shutdown (and restart) the qmaster and scheduler daemon
>>and
>>   all running execution daemons.
>>
>>   Shutdown all your execution hosts. Login to all your execution
hosts
>>and
>>   stop the execution daemons:
>>
>>      # $SGE_ROOT/$SGE_CELL/common/sgeexecd softstop
>>
>>   Then login to your qmaster machine and stop qmaster and scheduler:
>>
>>      # $SGE_ROOT/$SGE_CELL/common/sgemaster stop
>>
>>   Now verify with the 'ps' command that all Sun Grid Engine daemons
on
>>all
>>   hosts are stopped. If you decided to rename the 'sge_shepherd'
>binary
>>so
>>   that running jobs can continue to run during the patch
installation,
>>you
>>   must not kill the 'sge_shepherd' binary (process).
>>
>>   C. Installing the patch and restarting the software
>>   ---------------------------------------------------
>>
>>   Now install the patch by installing the patch with "patchadd" or by
>>   unpacking the 'tar.gz' files included in this patch as outlined
>>above.
>>
>>      Restarting the software
>>      -----------------------
>>
>>      If you have configured ARCo, you must first complete steps 1 and
>2
>>      from the section "Stopping the Accounting and Reporting Console"
>>from
>>      the ARCo patch before restarting the qmaster.
>>
>>      Please login to your qmaster machine and execution hosts and
>>enter:
>>
>>         # $SGE_ROOT/$SGE_CELL/common/sgemaster
>>         # $SGE_ROOT/$SGE_CELL/common/sgeexecd
>>
>>      After restarting the software, you may again enable your queues:
>>
>>         # qmod -e '*'
>>
>>      If you renamed the shepherd binary, you may safely delete the
old
>>      binary when all jobs which where running prior the patch
>>installation
>>      have finished.
>>
>>
>>Regards,
>>Andy
>>
>>------------------------------------------------------
>>http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessa
g
>e
>>Id=122031
>>
>>To unsubscribe from this discussion, e-mail: [users-
>>unsubscribe at gridengine.sunsource.net].
>
>------------------------------------------------------
>http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessag
e
>Id=122182
>
>To unsubscribe from this discussion, e-mail: [users-
>unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=122186

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list