[GE users] Minor upgrade from 6.2 to 6.2u2

matbradford matthew.bradford at eds.com
Fri Mar 6 13:33:55 GMT 2009


BDB on local disk.

>-----Original Message-----
>From: andy [mailto:andy.schwierskott at sun.com]
>Sent: 06 March 2009 13:22
>To: users at gridengine.sunsource.net
>Subject: RE: [GE users] Minor upgrade from 6.2 to 6.2u2
>
>Mat,
>
>that smells quite fishy.
>
>What spooling method did you use? Classic or BDB?
>
>Andy
>
>
>On Fri, 6 Mar 2009, matbradford wrote:
>
>> Andy,
>>
>> I'm possibly doing something stupid...
>>
>> I've shutdown the cluster, all daemons etc.
>> Added the patches using the tar.gz method for lx24-amd64 and common
>and
>> when I attempt to restart the sgemaster, it fails and I get the
>> following message in the messages file:
>>
>> <date> main|<host>|ClSetUlong: wrong type for field CE_Consumable
>> (lBoolT)
>>
>> I'm running on Suse Enterprise 10 on Xeon.
>>
>> Any ideas?
>>
>> Cheers,
>>
>> Mat
>>
>> >-----Original Message-----
>> >From: andy [mailto:andy.schwierskott at sun.com]
>> >Sent: 06 March 2009 08:54
>> >To: users at gridengine.sunsource.net
>> >Subject: Re: [GE users] Minor upgrade from 6.2 to 6.2u2
>> >
>> >Hi Mat,
>> >
>> >> What is the upgrade procedure for moving from SGE 6.2 to SGE6.2u2?
>> >>
>> >> Do we need to follow the full upgrade procedure as defined in the
>> >> documentation as if we were moving from 6.1 to 6.2 or is there a
>> >> simplified process?
>> >
>> >
>> >the spool file formats haven't changed. So running and pending jobs
>can
>> >continue to stay in the system. Basically it's the usual things you
>> have
>> >to
>> >ensure: don't overwrite a binary of a running deamon/process.
>> >
>> >See the long version below (taken from the patch installation
>> >instructions).
>> >The note about parallel jobs is probably too over-cautios. If you
>would
>> >rename the qrsh/rsh/rshd/qrsh_starter binaries as well I believe a
>> >running
>> >parallel job which already has started it's parallel task would not
>be
>> >affected by an upgrade.
>> >
>> >
>> >Special Install Instructions:
>> >-----------------------------
>> >
>> >   Content
>> >   -------
>> >   Patch Installation
>> >      Stopping the Sun Grid Engine cluster to prevent start of new
>jobs
>> >      Shutting down the Sun Grid Engine daemons
>> >      Installing the patch and restarting the software
>> >   New functionality delivered with SGE 6.2 Update 2
>> >
>> >
>> >   Patch Installation
>> >   ------------------
>> >
>> >   These installation instructions assume that you are running a
>> >homogeneous
>> >   Sun Grid Engine cluster (called "the software") where all hosts
>> share
>> >the
>> >   same directory for the binaries. If you are running the software
>in
>> a
>> >   heterogeneous environment (mix of different binary
architectures),
>> >you
>> >   need to apply the patch installation for all binary architectures
>as
>> >well
>> >   as the "common" and "arco" packages. See the patch matrix above
>for
>> >   details about the available patches.
>> >
>> >   If you upgrade from a previous version of Sun Grid Engine (for
>> >example
>> >   6.0), please perform the steps described in the Sun Grid Engine
>> >   documentation.
>> >(http://wikis.sun.com/display/gridengine62u2/Upgrading)
>> >
>> >   If you installed the software on local filesystems, you need to
>> >install
>> >   all relevant patches on all hosts where you installed the
software
>> >   locally.
>> >
>> >   By default, there should by no running jobs when the patch is
>> >installed.
>> >   There may pending batch jobs, but no pending interactive jobs
>(qrsh,
>> >   qmake, qsh, qtcsh, qlogin).
>> >
>> >   It is possible to install the patch with running batch jobs. To
>> avoid
>> >a
>> >   failure of the active 'sge_shepherd' binary, it is necessary to
>move
>> >the
>> >   old shepherd binary (and copy it back prior to the installation
of
>> >the
>> >   patch).
>> >
>> >   You can not install the patch with running interactive jobs,
>'qmake'
>> >jobs
>> >   or with running parallel jobs which use the tight integration
>> support
>> >   (control_slaves=true in PE configuration is set).
>> >
>> >   A. Stopping the Sun Grid Engine cluster to prevent start of new
>jobs
>> >
------------------------------------------------------------------
>--
>> >
>> >   Disable all queues so that no new jobs are started:
>> >
>> >      # qmod -d '*'
>> >
>> >   Optional (only needed if there are running jobs which should
>> continue
>> >to
>> >   run when the patch is installed):
>> >
>> >      # cd $SGE_ROOT/bin
>> >      # mv <arch>/sge_shepherd <arch>/sge_shepherd.sge62
>> >      # cp <arch>/sge_shepherd.sge62 <arch>/sge_shepherd
>> >
>> >   It is important that the binary is moved with the "mv" command.
It
>> >should
>> >   not be copied because this could cause the crash of an active
>> >shepherd
>> >   process which is currently running job when the patch is
>installed.
>> >
>> >   B. Shutting down the Sun Grid Engine daemons
>> >   --------------------------------------------
>> >
>> >   You need to shutdown (and restart) the qmaster and scheduler
>daemon
>> >and
>> >   all running execution daemons.
>> >
>> >   Shutdown all your execution hosts. Login to all your execution
>hosts
>> >and
>> >   stop the execution daemons:
>> >
>> >      # $SGE_ROOT/$SGE_CELL/common/sgeexecd softstop
>> >
>> >   Then login to your qmaster machine and stop qmaster and
scheduler:
>> >
>> >      # $SGE_ROOT/$SGE_CELL/common/sgemaster stop
>> >
>> >   Now verify with the 'ps' command that all Sun Grid Engine daemons
>on
>> >all
>> >   hosts are stopped. If you decided to rename the 'sge_shepherd'
>> binary
>> >so
>> >   that running jobs can continue to run during the patch
>installation,
>> >you
>> >   must not kill the 'sge_shepherd' binary (process).
>> >
>> >   C. Installing the patch and restarting the software
>> >   ---------------------------------------------------
>> >
>> >   Now install the patch by installing the patch with "patchadd" or
>by
>> >   unpacking the 'tar.gz' files included in this patch as outlined
>> >above.
>> >
>> >      Restarting the software
>> >      -----------------------
>> >
>> >      If you have configured ARCo, you must first complete steps 1
>and
>> 2
>> >      from the section "Stopping the Accounting and Reporting
>Console"
>> >from
>> >      the ARCo patch before restarting the qmaster.
>> >
>> >      Please login to your qmaster machine and execution hosts and
>> >enter:
>> >
>> >         # $SGE_ROOT/$SGE_CELL/common/sgemaster
>> >         # $SGE_ROOT/$SGE_CELL/common/sgeexecd
>> >
>> >      After restarting the software, you may again enable your
>queues:
>> >
>> >         # qmod -e '*'
>> >
>> >      If you renamed the shepherd binary, you may safely delete the
>old
>> >      binary when all jobs which where running prior the patch
>> >installation
>> >      have finished.
>> >
>> >
>> >Regards,
>> >Andy
>> >
>> >------------------------------------------------------
>>
>>http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessa
g
>> e
>> >Id=122031
>> >
>> >To unsubscribe from this discussion, e-mail: [users-
>> >unsubscribe at gridengine.sunsource.net].
>>
>> ------------------------------------------------------
>>
>http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessag
e
>Id=122182
>>
>> To unsubscribe from this discussion, e-mail: [users-
>unsubscribe at gridengine.sunsource.net].
>>
>
>------------------------------------------------------
>http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessag
e
>Id=122210
>
>To unsubscribe from this discussion, e-mail: [users-
>unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=122215

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list