[GE users] Minor upgrade from 6.2 to 6.2u2

matbradford matthew.bradford at eds.com
Fri Mar 6 12:41:49 GMT 2009


I'm possibly doing something stupid...

I've shutdown the cluster, all daemons etc.
Added the patches using the tar.gz method for lx24-amd64 and common and
when I attempt to restart the sgemaster, it fails and I get the
following message in the messages file:

<date> main|<host>|ClSetUlong: wrong type for field CE_Consumable

I'm running on Suse Enterprise 10 on Xeon.

Any ideas?



>-----Original Message-----
>From: andy [mailto:andy.schwierskott at sun.com]
>Sent: 06 March 2009 08:54
>To: users at gridengine.sunsource.net
>Subject: Re: [GE users] Minor upgrade from 6.2 to 6.2u2
>Hi Mat,
>> What is the upgrade procedure for moving from SGE 6.2 to SGE6.2u2?
>> Do we need to follow the full upgrade procedure as defined in the
>> documentation as if we were moving from 6.1 to 6.2 or is there a
>> simplified process?
>the spool file formats haven't changed. So running and pending jobs can
>continue to stay in the system. Basically it's the usual things you
>ensure: don't overwrite a binary of a running deamon/process.
>See the long version below (taken from the patch installation
>The note about parallel jobs is probably too over-cautios. If you would
>rename the qrsh/rsh/rshd/qrsh_starter binaries as well I believe a
>parallel job which already has started it's parallel task would not be
>affected by an upgrade.
>Special Install Instructions:
>   Content
>   -------
>   Patch Installation
>      Stopping the Sun Grid Engine cluster to prevent start of new jobs
>      Shutting down the Sun Grid Engine daemons
>      Installing the patch and restarting the software
>   New functionality delivered with SGE 6.2 Update 2
>   Patch Installation
>   ------------------
>   These installation instructions assume that you are running a
>   Sun Grid Engine cluster (called "the software") where all hosts
>   same directory for the binaries. If you are running the software in
>   heterogeneous environment (mix of different binary architectures),
>   need to apply the patch installation for all binary architectures as
>   as the "common" and "arco" packages. See the patch matrix above for
>   details about the available patches.
>   If you upgrade from a previous version of Sun Grid Engine (for
>   6.0), please perform the steps described in the Sun Grid Engine
>   documentation.
>   If you installed the software on local filesystems, you need to
>   all relevant patches on all hosts where you installed the software
>   locally.
>   By default, there should by no running jobs when the patch is
>   There may pending batch jobs, but no pending interactive jobs (qrsh,
>   qmake, qsh, qtcsh, qlogin).
>   It is possible to install the patch with running batch jobs. To
>   failure of the active 'sge_shepherd' binary, it is necessary to move
>   old shepherd binary (and copy it back prior to the installation of
>   patch).
>   You can not install the patch with running interactive jobs, 'qmake'
>   or with running parallel jobs which use the tight integration
>   (control_slaves=true in PE configuration is set).
>   A. Stopping the Sun Grid Engine cluster to prevent start of new jobs
>   --------------------------------------------------------------------
>   Disable all queues so that no new jobs are started:
>      # qmod -d '*'
>   Optional (only needed if there are running jobs which should
>   run when the patch is installed):
>      # cd $SGE_ROOT/bin
>      # mv <arch>/sge_shepherd <arch>/sge_shepherd.sge62
>      # cp <arch>/sge_shepherd.sge62 <arch>/sge_shepherd
>   It is important that the binary is moved with the "mv" command. It
>   not be copied because this could cause the crash of an active
>   process which is currently running job when the patch is installed.
>   B. Shutting down the Sun Grid Engine daemons
>   --------------------------------------------
>   You need to shutdown (and restart) the qmaster and scheduler daemon
>   all running execution daemons.
>   Shutdown all your execution hosts. Login to all your execution hosts
>   stop the execution daemons:
>      # $SGE_ROOT/$SGE_CELL/common/sgeexecd softstop
>   Then login to your qmaster machine and stop qmaster and scheduler:
>      # $SGE_ROOT/$SGE_CELL/common/sgemaster stop
>   Now verify with the 'ps' command that all Sun Grid Engine daemons on
>   hosts are stopped. If you decided to rename the 'sge_shepherd'
>   that running jobs can continue to run during the patch installation,
>   must not kill the 'sge_shepherd' binary (process).
>   C. Installing the patch and restarting the software
>   ---------------------------------------------------
>   Now install the patch by installing the patch with "patchadd" or by
>   unpacking the 'tar.gz' files included in this patch as outlined
>      Restarting the software
>      -----------------------
>      If you have configured ARCo, you must first complete steps 1 and
>      from the section "Stopping the Accounting and Reporting Console"
>      the ARCo patch before restarting the qmaster.
>      Please login to your qmaster machine and execution hosts and
>         # $SGE_ROOT/$SGE_CELL/common/sgemaster
>         # $SGE_ROOT/$SGE_CELL/common/sgeexecd
>      After restarting the software, you may again enable your queues:
>         # qmod -e '*'
>      If you renamed the shepherd binary, you may safely delete the old
>      binary when all jobs which where running prior the patch
>      have finished.
>To unsubscribe from this discussion, e-mail: [users-
>unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list