[GE users] Minor upgrade from 6.2 to 6.2u2

matbradford matthew.bradford at eds.com
Mon Mar 9 10:50:31 GMT 2009


Andy,

No joy I'm afraid. I did have an LD_LIBRARY_PATH set to the previous sge
directory, so unset this and still received the same message when
attempting to start qmaster.

I moved the libspoolb.so in the sge/lib/lx24-amd64 leg, and then the
sge_master failed completely with a related message. This shows that
sge_master was attempting to use the correct library.

Any other way of diagnosing this?

Cheers,

Mat

>-----Original Message-----
>From: andy [mailto:andy.schwierskott at sun.com]
>Sent: 09 March 2009 09:29
>To: users at gridengine.sunsource.net
>Subject: RE: [GE users] Minor upgrade from 6.2 to 6.2u2
>
>Hi Mat,
>
>we've tested upgrade from SGE 6.2. Everything worked (and should work
by
>design) without running the upgrade procedure.
>
>"under the hood" w've changed the CE_consumable type from BOOL to ULONG
>to
>support the "consumable per job" feature.
>
>Our assumption is that there still was an old "libspoolb.so" in place
or
>LD_LIBARY_PATH pointed to an older version of that library. On Linux
and
>Solaris the SGE binaries are compiled with "RUNPATH" to use
>"libspoolb.so"
>from the SGE distribution, however a set LD_LIBARY_PATH would override
>this.
>
>Could you please cross-check this?
>
>The SGE startup scripts (unless they are very old or hand-crafted) do
>*not*
>set LD_LIBARY_PATH for Linux and Solaris.
>
>Andy
>
>On Fri, 6 Mar 2009, matbradford wrote:
>
>> BDB on local disk.
>>
>> >-----Original Message-----
>> >From: andy [mailto:andy.schwierskott at sun.com]
>> >Sent: 06 March 2009 13:22
>> >To: users at gridengine.sunsource.net
>> >Subject: RE: [GE users] Minor upgrade from 6.2 to 6.2u2
>> >
>> >Mat,
>> >
>> >that smells quite fishy.
>> >
>> >What spooling method did you use? Classic or BDB?
>> >
>> >Andy
>> >
>> >
>> >On Fri, 6 Mar 2009, matbradford wrote:
>> >
>> >> Andy,
>> >>
>> >> I'm possibly doing something stupid...
>> >>
>> >> I've shutdown the cluster, all daemons etc.
>> >> Added the patches using the tar.gz method for lx24-amd64 and
common
>> >and
>> >> when I attempt to restart the sgemaster, it fails and I get the
>> >> following message in the messages file:
>> >>
>> >> <date> main|<host>|ClSetUlong: wrong type for field CE_Consumable
>> >> (lBoolT)
>> >>
>> >> I'm running on Suse Enterprise 10 on Xeon.
>> >>
>> >> Any ideas?
>> >>
>> >> Cheers,
>> >>
>> >> Mat
>> >>
>> >> >-----Original Message-----
>> >> >From: andy [mailto:andy.schwierskott at sun.com]
>> >> >Sent: 06 March 2009 08:54
>> >> >To: users at gridengine.sunsource.net
>> >> >Subject: Re: [GE users] Minor upgrade from 6.2 to 6.2u2
>> >> >
>> >> >Hi Mat,
>> >> >
>> >> >> What is the upgrade procedure for moving from SGE 6.2 to
>SGE6.2u2?
>> >> >>
>> >> >> Do we need to follow the full upgrade procedure as defined in
>the
>> >> >> documentation as if we were moving from 6.1 to 6.2 or is there
a
>> >> >> simplified process?
>> >> >
>> >> >
>> >> >the spool file formats haven't changed. So running and pending
>jobs
>> >can
>> >> >continue to stay in the system. Basically it's the usual things
>you
>> >> have
>> >> >to
>> >> >ensure: don't overwrite a binary of a running deamon/process.
>> >> >
>> >> >See the long version below (taken from the patch installation
>> >> >instructions).
>> >> >The note about parallel jobs is probably too over-cautios. If you
>> >would
>> >> >rename the qrsh/rsh/rshd/qrsh_starter binaries as well I believe
a
>> >> >running
>> >> >parallel job which already has started it's parallel task would
>not
>> >be
>> >> >affected by an upgrade.
>> >> >
>> >> >
>> >> >Special Install Instructions:
>> >> >-----------------------------
>> >> >
>> >> >   Content
>> >> >   -------
>> >> >   Patch Installation
>> >> >      Stopping the Sun Grid Engine cluster to prevent start of
new
>> >jobs
>> >> >      Shutting down the Sun Grid Engine daemons
>> >> >      Installing the patch and restarting the software
>> >> >   New functionality delivered with SGE 6.2 Update 2
>> >> >
>> >> >
>> >> >   Patch Installation
>> >> >   ------------------
>> >> >
>> >> >   These installation instructions assume that you are running a
>> >> >homogeneous
>> >> >   Sun Grid Engine cluster (called "the software") where all
hosts
>> >> share
>> >> >the
>> >> >   same directory for the binaries. If you are running the
>software
>> >in
>> >> a
>> >> >   heterogeneous environment (mix of different binary
>> architectures),
>> >> >you
>> >> >   need to apply the patch installation for all binary
>architectures
>> >as
>> >> >well
>> >> >   as the "common" and "arco" packages. See the patch matrix
above
>> >for
>> >> >   details about the available patches.
>> >> >
>> >> >   If you upgrade from a previous version of Sun Grid Engine (for
>> >> >example
>> >> >   6.0), please perform the steps described in the Sun Grid
Engine
>> >> >   documentation.
>> >> >(http://wikis.sun.com/display/gridengine62u2/Upgrading)
>> >> >
>> >> >   If you installed the software on local filesystems, you need
to
>> >> >install
>> >> >   all relevant patches on all hosts where you installed the
>> software
>> >> >   locally.
>> >> >
>> >> >   By default, there should by no running jobs when the patch is
>> >> >installed.
>> >> >   There may pending batch jobs, but no pending interactive jobs
>> >(qrsh,
>> >> >   qmake, qsh, qtcsh, qlogin).
>> >> >
>> >> >   It is possible to install the patch with running batch jobs.
To
>> >> avoid
>> >> >a
>> >> >   failure of the active 'sge_shepherd' binary, it is necessary
to
>> >move
>> >> >the
>> >> >   old shepherd binary (and copy it back prior to the
installation
>> of
>> >> >the
>> >> >   patch).
>> >> >
>> >> >   You can not install the patch with running interactive jobs,
>> >'qmake'
>> >> >jobs
>> >> >   or with running parallel jobs which use the tight integration
>> >> support
>> >> >   (control_slaves=true in PE configuration is set).
>> >> >
>> >> >   A. Stopping the Sun Grid Engine cluster to prevent start of
new
>> >jobs
>> >> >
>> ------------------------------------------------------------------
>> >--
>> >> >
>> >> >   Disable all queues so that no new jobs are started:
>> >> >
>> >> >      # qmod -d '*'
>> >> >
>> >> >   Optional (only needed if there are running jobs which should
>> >> continue
>> >> >to
>> >> >   run when the patch is installed):
>> >> >
>> >> >      # cd $SGE_ROOT/bin
>> >> >      # mv <arch>/sge_shepherd <arch>/sge_shepherd.sge62
>> >> >      # cp <arch>/sge_shepherd.sge62 <arch>/sge_shepherd
>> >> >
>> >> >   It is important that the binary is moved with the "mv"
command.
>> It
>> >> >should
>> >> >   not be copied because this could cause the crash of an active
>> >> >shepherd
>> >> >   process which is currently running job when the patch is
>> >installed.
>> >> >
>> >> >   B. Shutting down the Sun Grid Engine daemons
>> >> >   --------------------------------------------
>> >> >
>> >> >   You need to shutdown (and restart) the qmaster and scheduler
>> >daemon
>> >> >and
>> >> >   all running execution daemons.
>> >> >
>> >> >   Shutdown all your execution hosts. Login to all your execution
>> >hosts
>> >> >and
>> >> >   stop the execution daemons:
>> >> >
>> >> >      # $SGE_ROOT/$SGE_CELL/common/sgeexecd softstop
>> >> >
>> >> >   Then login to your qmaster machine and stop qmaster and
>> scheduler:
>> >> >
>> >> >      # $SGE_ROOT/$SGE_CELL/common/sgemaster stop
>> >> >
>> >> >   Now verify with the 'ps' command that all Sun Grid Engine
>daemons
>> >on
>> >> >all
>> >> >   hosts are stopped. If you decided to rename the 'sge_shepherd'
>> >> binary
>> >> >so
>> >> >   that running jobs can continue to run during the patch
>> >installation,
>> >> >you
>> >> >   must not kill the 'sge_shepherd' binary (process).
>> >> >
>> >> >   C. Installing the patch and restarting the software
>> >> >   ---------------------------------------------------
>> >> >
>> >> >   Now install the patch by installing the patch with "patchadd"
>or
>> >by
>> >> >   unpacking the 'tar.gz' files included in this patch as
outlined
>> >> >above.
>> >> >
>> >> >      Restarting the software
>> >> >      -----------------------
>> >> >
>> >> >      If you have configured ARCo, you must first complete steps
1
>> >and
>> >> 2
>> >> >      from the section "Stopping the Accounting and Reporting
>> >Console"
>> >> >from
>> >> >      the ARCo patch before restarting the qmaster.
>> >> >
>> >> >      Please login to your qmaster machine and execution hosts
and
>> >> >enter:
>> >> >
>> >> >         # $SGE_ROOT/$SGE_CELL/common/sgemaster
>> >> >         # $SGE_ROOT/$SGE_CELL/common/sgeexecd
>> >> >
>> >> >      After restarting the software, you may again enable your
>> >queues:
>> >> >
>> >> >         # qmod -e '*'
>> >> >
>> >> >      If you renamed the shepherd binary, you may safely delete
>the
>> >old
>> >> >      binary when all jobs which where running prior the patch
>> >> >installation
>> >> >      have finished.
>> >> >
>> >> >
>> >> >Regards,
>> >> >Andy
>> >> >
>> >> >------------------------------------------------------
>> >>
>>
>>>http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMess
a
>> g
>> >> e
>> >> >Id=122031
>> >> >
>> >> >To unsubscribe from this discussion, e-mail: [users-
>> >> >unsubscribe at gridengine.sunsource.net].
>> >>
>> >> ------------------------------------------------------
>> >>
>>
>>http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessa
g
>> e
>> >Id=122182
>> >>
>> >> To unsubscribe from this discussion, e-mail: [users-
>> >unsubscribe at gridengine.sunsource.net].
>> >>
>> >
>> >------------------------------------------------------
>>
>>http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessa
g
>> e
>> >Id=122210
>> >
>> >To unsubscribe from this discussion, e-mail: [users-
>> >unsubscribe at gridengine.sunsource.net].
>>
>> ------------------------------------------------------
>>
>http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessag
e
>Id=122215
>>
>> To unsubscribe from this discussion, e-mail: [users-
>unsubscribe at gridengine.sunsource.net].
>>
>
>------------------------------------------------------
>http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessag
e
>Id=125203
>
>To unsubscribe from this discussion, e-mail: [users-
>unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=125274

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list