[GE users] Minor upgrade from 6.2 to 6.2u2

andy andy.schwierskott at sun.com
Fri Mar 6 13:21:45 GMT 2009


Mat,

that smells quite fishy.

What spooling method did you use? Classic or BDB?

Andy


On Fri, 6 Mar 2009, matbradford wrote:

> Andy,
>
> I'm possibly doing something stupid...
>
> I've shutdown the cluster, all daemons etc.
> Added the patches using the tar.gz method for lx24-amd64 and common and
> when I attempt to restart the sgemaster, it fails and I get the
> following message in the messages file:
>
> <date> main|<host>|ClSetUlong: wrong type for field CE_Consumable
> (lBoolT)
>
> I'm running on Suse Enterprise 10 on Xeon.
>
> Any ideas?
>
> Cheers,
>
> Mat
>
> >-----Original Message-----
> >From: andy [mailto:andy.schwierskott at sun.com]
> >Sent: 06 March 2009 08:54
> >To: users at gridengine.sunsource.net
> >Subject: Re: [GE users] Minor upgrade from 6.2 to 6.2u2
> >
> >Hi Mat,
> >
> >> What is the upgrade procedure for moving from SGE 6.2 to SGE6.2u2?
> >>
> >> Do we need to follow the full upgrade procedure as defined in the
> >> documentation as if we were moving from 6.1 to 6.2 or is there a
> >> simplified process?
> >
> >
> >the spool file formats haven't changed. So running and pending jobs can
> >continue to stay in the system. Basically it's the usual things you
> have
> >to
> >ensure: don't overwrite a binary of a running deamon/process.
> >
> >See the long version below (taken from the patch installation
> >instructions).
> >The note about parallel jobs is probably too over-cautios. If you would
> >rename the qrsh/rsh/rshd/qrsh_starter binaries as well I believe a
> >running
> >parallel job which already has started it's parallel task would not be
> >affected by an upgrade.
> >
> >
> >Special Install Instructions:
> >-----------------------------
> >
> >   Content
> >   -------
> >   Patch Installation
> >      Stopping the Sun Grid Engine cluster to prevent start of new jobs
> >      Shutting down the Sun Grid Engine daemons
> >      Installing the patch and restarting the software
> >   New functionality delivered with SGE 6.2 Update 2
> >
> >
> >   Patch Installation
> >   ------------------
> >
> >   These installation instructions assume that you are running a
> >homogeneous
> >   Sun Grid Engine cluster (called "the software") where all hosts
> share
> >the
> >   same directory for the binaries. If you are running the software in
> a
> >   heterogeneous environment (mix of different binary architectures),
> >you
> >   need to apply the patch installation for all binary architectures as
> >well
> >   as the "common" and "arco" packages. See the patch matrix above for
> >   details about the available patches.
> >
> >   If you upgrade from a previous version of Sun Grid Engine (for
> >example
> >   6.0), please perform the steps described in the Sun Grid Engine
> >   documentation.
> >(http://wikis.sun.com/display/gridengine62u2/Upgrading)
> >
> >   If you installed the software on local filesystems, you need to
> >install
> >   all relevant patches on all hosts where you installed the software
> >   locally.
> >
> >   By default, there should by no running jobs when the patch is
> >installed.
> >   There may pending batch jobs, but no pending interactive jobs (qrsh,
> >   qmake, qsh, qtcsh, qlogin).
> >
> >   It is possible to install the patch with running batch jobs. To
> avoid
> >a
> >   failure of the active 'sge_shepherd' binary, it is necessary to move
> >the
> >   old shepherd binary (and copy it back prior to the installation of
> >the
> >   patch).
> >
> >   You can not install the patch with running interactive jobs, 'qmake'
> >jobs
> >   or with running parallel jobs which use the tight integration
> support
> >   (control_slaves=true in PE configuration is set).
> >
> >   A. Stopping the Sun Grid Engine cluster to prevent start of new jobs
> >   --------------------------------------------------------------------
> >
> >   Disable all queues so that no new jobs are started:
> >
> >      # qmod -d '*'
> >
> >   Optional (only needed if there are running jobs which should
> continue
> >to
> >   run when the patch is installed):
> >
> >      # cd $SGE_ROOT/bin
> >      # mv <arch>/sge_shepherd <arch>/sge_shepherd.sge62
> >      # cp <arch>/sge_shepherd.sge62 <arch>/sge_shepherd
> >
> >   It is important that the binary is moved with the "mv" command. It
> >should
> >   not be copied because this could cause the crash of an active
> >shepherd
> >   process which is currently running job when the patch is installed.
> >
> >   B. Shutting down the Sun Grid Engine daemons
> >   --------------------------------------------
> >
> >   You need to shutdown (and restart) the qmaster and scheduler daemon
> >and
> >   all running execution daemons.
> >
> >   Shutdown all your execution hosts. Login to all your execution hosts
> >and
> >   stop the execution daemons:
> >
> >      # $SGE_ROOT/$SGE_CELL/common/sgeexecd softstop
> >
> >   Then login to your qmaster machine and stop qmaster and scheduler:
> >
> >      # $SGE_ROOT/$SGE_CELL/common/sgemaster stop
> >
> >   Now verify with the 'ps' command that all Sun Grid Engine daemons on
> >all
> >   hosts are stopped. If you decided to rename the 'sge_shepherd'
> binary
> >so
> >   that running jobs can continue to run during the patch installation,
> >you
> >   must not kill the 'sge_shepherd' binary (process).
> >
> >   C. Installing the patch and restarting the software
> >   ---------------------------------------------------
> >
> >   Now install the patch by installing the patch with "patchadd" or by
> >   unpacking the 'tar.gz' files included in this patch as outlined
> >above.
> >
> >      Restarting the software
> >      -----------------------
> >
> >      If you have configured ARCo, you must first complete steps 1 and
> 2
> >      from the section "Stopping the Accounting and Reporting Console"
> >from
> >      the ARCo patch before restarting the qmaster.
> >
> >      Please login to your qmaster machine and execution hosts and
> >enter:
> >
> >         # $SGE_ROOT/$SGE_CELL/common/sgemaster
> >         # $SGE_ROOT/$SGE_CELL/common/sgeexecd
> >
> >      After restarting the software, you may again enable your queues:
> >
> >         # qmod -e '*'
> >
> >      If you renamed the shepherd binary, you may safely delete the old
> >      binary when all jobs which where running prior the patch
> >installation
> >      have finished.
> >
> >
> >Regards,
> >Andy
> >
> >------------------------------------------------------
> >http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessag
> e
> >Id=122031
> >
> >To unsubscribe from this discussion, e-mail: [users-
> >unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=122182
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=122210

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list