[GE users] Minor upgrade from 6.2 to 6.2u2

andy andy.schwierskott at sun.com
Mon Mar 9 09:28:55 GMT 2009


Hi Mat,

we've tested upgrade from SGE 6.2. Everything worked (and should work by
design) without running the upgrade procedure.

"under the hood" w've changed the CE_consumable type from BOOL to ULONG to
support the "consumable per job" feature.

Our assumption is that there still was an old "libspoolb.so" in place or
LD_LIBARY_PATH pointed to an older version of that library. On Linux and
Solaris the SGE binaries are compiled with "RUNPATH" to use "libspoolb.so"
from the SGE distribution, however a set LD_LIBARY_PATH would override this.

Could you please cross-check this?

The SGE startup scripts (unless they are very old or hand-crafted) do *not*
set LD_LIBARY_PATH for Linux and Solaris.

Andy

On Fri, 6 Mar 2009, matbradford wrote:

> BDB on local disk.
>
> >-----Original Message-----
> >From: andy [mailto:andy.schwierskott at sun.com]
> >Sent: 06 March 2009 13:22
> >To: users at gridengine.sunsource.net
> >Subject: RE: [GE users] Minor upgrade from 6.2 to 6.2u2
> >
> >Mat,
> >
> >that smells quite fishy.
> >
> >What spooling method did you use? Classic or BDB?
> >
> >Andy
> >
> >
> >On Fri, 6 Mar 2009, matbradford wrote:
> >
> >> Andy,
> >>
> >> I'm possibly doing something stupid...
> >>
> >> I've shutdown the cluster, all daemons etc.
> >> Added the patches using the tar.gz method for lx24-amd64 and common
> >and
> >> when I attempt to restart the sgemaster, it fails and I get the
> >> following message in the messages file:
> >>
> >> <date> main|<host>|ClSetUlong: wrong type for field CE_Consumable
> >> (lBoolT)
> >>
> >> I'm running on Suse Enterprise 10 on Xeon.
> >>
> >> Any ideas?
> >>
> >> Cheers,
> >>
> >> Mat
> >>
> >> >-----Original Message-----
> >> >From: andy [mailto:andy.schwierskott at sun.com]
> >> >Sent: 06 March 2009 08:54
> >> >To: users at gridengine.sunsource.net
> >> >Subject: Re: [GE users] Minor upgrade from 6.2 to 6.2u2
> >> >
> >> >Hi Mat,
> >> >
> >> >> What is the upgrade procedure for moving from SGE 6.2 to SGE6.2u2?
> >> >>
> >> >> Do we need to follow the full upgrade procedure as defined in the
> >> >> documentation as if we were moving from 6.1 to 6.2 or is there a
> >> >> simplified process?
> >> >
> >> >
> >> >the spool file formats haven't changed. So running and pending jobs
> >can
> >> >continue to stay in the system. Basically it's the usual things you
> >> have
> >> >to
> >> >ensure: don't overwrite a binary of a running deamon/process.
> >> >
> >> >See the long version below (taken from the patch installation
> >> >instructions).
> >> >The note about parallel jobs is probably too over-cautios. If you
> >would
> >> >rename the qrsh/rsh/rshd/qrsh_starter binaries as well I believe a
> >> >running
> >> >parallel job which already has started it's parallel task would not
> >be
> >> >affected by an upgrade.
> >> >
> >> >
> >> >Special Install Instructions:
> >> >-----------------------------
> >> >
> >> >   Content
> >> >   -------
> >> >   Patch Installation
> >> >      Stopping the Sun Grid Engine cluster to prevent start of new
> >jobs
> >> >      Shutting down the Sun Grid Engine daemons
> >> >      Installing the patch and restarting the software
> >> >   New functionality delivered with SGE 6.2 Update 2
> >> >
> >> >
> >> >   Patch Installation
> >> >   ------------------
> >> >
> >> >   These installation instructions assume that you are running a
> >> >homogeneous
> >> >   Sun Grid Engine cluster (called "the software") where all hosts
> >> share
> >> >the
> >> >   same directory for the binaries. If you are running the software
> >in
> >> a
> >> >   heterogeneous environment (mix of different binary
> architectures),
> >> >you
> >> >   need to apply the patch installation for all binary architectures
> >as
> >> >well
> >> >   as the "common" and "arco" packages. See the patch matrix above
> >for
> >> >   details about the available patches.
> >> >
> >> >   If you upgrade from a previous version of Sun Grid Engine (for
> >> >example
> >> >   6.0), please perform the steps described in the Sun Grid Engine
> >> >   documentation.
> >> >(http://wikis.sun.com/display/gridengine62u2/Upgrading)
> >> >
> >> >   If you installed the software on local filesystems, you need to
> >> >install
> >> >   all relevant patches on all hosts where you installed the
> software
> >> >   locally.
> >> >
> >> >   By default, there should by no running jobs when the patch is
> >> >installed.
> >> >   There may pending batch jobs, but no pending interactive jobs
> >(qrsh,
> >> >   qmake, qsh, qtcsh, qlogin).
> >> >
> >> >   It is possible to install the patch with running batch jobs. To
> >> avoid
> >> >a
> >> >   failure of the active 'sge_shepherd' binary, it is necessary to
> >move
> >> >the
> >> >   old shepherd binary (and copy it back prior to the installation
> of
> >> >the
> >> >   patch).
> >> >
> >> >   You can not install the patch with running interactive jobs,
> >'qmake'
> >> >jobs
> >> >   or with running parallel jobs which use the tight integration
> >> support
> >> >   (control_slaves=true in PE configuration is set).
> >> >
> >> >   A. Stopping the Sun Grid Engine cluster to prevent start of new
> >jobs
> >> >
> ------------------------------------------------------------------
> >--
> >> >
> >> >   Disable all queues so that no new jobs are started:
> >> >
> >> >      # qmod -d '*'
> >> >
> >> >   Optional (only needed if there are running jobs which should
> >> continue
> >> >to
> >> >   run when the patch is installed):
> >> >
> >> >      # cd $SGE_ROOT/bin
> >> >      # mv <arch>/sge_shepherd <arch>/sge_shepherd.sge62
> >> >      # cp <arch>/sge_shepherd.sge62 <arch>/sge_shepherd
> >> >
> >> >   It is important that the binary is moved with the "mv" command.
> It
> >> >should
> >> >   not be copied because this could cause the crash of an active
> >> >shepherd
> >> >   process which is currently running job when the patch is
> >installed.
> >> >
> >> >   B. Shutting down the Sun Grid Engine daemons
> >> >   --------------------------------------------
> >> >
> >> >   You need to shutdown (and restart) the qmaster and scheduler
> >daemon
> >> >and
> >> >   all running execution daemons.
> >> >
> >> >   Shutdown all your execution hosts. Login to all your execution
> >hosts
> >> >and
> >> >   stop the execution daemons:
> >> >
> >> >      # $SGE_ROOT/$SGE_CELL/common/sgeexecd softstop
> >> >
> >> >   Then login to your qmaster machine and stop qmaster and
> scheduler:
> >> >
> >> >      # $SGE_ROOT/$SGE_CELL/common/sgemaster stop
> >> >
> >> >   Now verify with the 'ps' command that all Sun Grid Engine daemons
> >on
> >> >all
> >> >   hosts are stopped. If you decided to rename the 'sge_shepherd'
> >> binary
> >> >so
> >> >   that running jobs can continue to run during the patch
> >installation,
> >> >you
> >> >   must not kill the 'sge_shepherd' binary (process).
> >> >
> >> >   C. Installing the patch and restarting the software
> >> >   ---------------------------------------------------
> >> >
> >> >   Now install the patch by installing the patch with "patchadd" or
> >by
> >> >   unpacking the 'tar.gz' files included in this patch as outlined
> >> >above.
> >> >
> >> >      Restarting the software
> >> >      -----------------------
> >> >
> >> >      If you have configured ARCo, you must first complete steps 1
> >and
> >> 2
> >> >      from the section "Stopping the Accounting and Reporting
> >Console"
> >> >from
> >> >      the ARCo patch before restarting the qmaster.
> >> >
> >> >      Please login to your qmaster machine and execution hosts and
> >> >enter:
> >> >
> >> >         # $SGE_ROOT/$SGE_CELL/common/sgemaster
> >> >         # $SGE_ROOT/$SGE_CELL/common/sgeexecd
> >> >
> >> >      After restarting the software, you may again enable your
> >queues:
> >> >
> >> >         # qmod -e '*'
> >> >
> >> >      If you renamed the shepherd binary, you may safely delete the
> >old
> >> >      binary when all jobs which where running prior the patch
> >> >installation
> >> >      have finished.
> >> >
> >> >
> >> >Regards,
> >> >Andy
> >> >
> >> >------------------------------------------------------
> >>
> >>http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessa
> g
> >> e
> >> >Id=122031
> >> >
> >> >To unsubscribe from this discussion, e-mail: [users-
> >> >unsubscribe at gridengine.sunsource.net].
> >>
> >> ------------------------------------------------------
> >>
> >http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessag
> e
> >Id=122182
> >>
> >> To unsubscribe from this discussion, e-mail: [users-
> >unsubscribe at gridengine.sunsource.net].
> >>
> >
> >------------------------------------------------------
> >http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessag
> e
> >Id=122210
> >
> >To unsubscribe from this discussion, e-mail: [users-
> >unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=122215
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=125203

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list