[GE users] sge_execd says it starts but it doesn't start

futurity neil at futurity.co.uk
Tue Apr 27 17:54:42 BST 2010


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi rems0,

Found the problem thanks to your help :)

Nothing in $SGE_ROOT/$SGE_CELL/spool/`hostname`/messages as a local spool directory was configured.

However there were some message files in /tmp. I have to confess that I never through to look there.

The message said:

04/27/2010 15:32:03|  main|stg-zoom1|C|can't create directory "stg-zoom1": No such file or directory

Assuming that it needs to create this local spool directory, it was either a permission problem or the parent directory wasn't there.

It turns out that I needed to create the parent directories right down to the spool directory.  Doing the following as root solved the problem:


mkdir -p /local/sge/spool
chown -R sgeadmin62:sgeadmin62 /local




where "/local" is the local spool directory I specified and "sgeadmin62" is the grid engine admin user.

It all starts perfectly now.  Thank you so much for your help.  I hope this solution helps someone else in the future.

Neil


On 27 April 2010 17:20, rems0 <Richard.Ems at cape-horn-eng.com<mailto:Richard.Ems at cape-horn-eng.com>> wrote:
Hi Neil,

the ldd output seems ok to me.

there is also a log file created by sgeexecd itself,

$SGE_ROOT/$SGE_CELL/spool/`hostname`/messages , anything there?
Or in the /tmp directory?

We use openSUSE 11.2 on the SGE server and still openSUSE 11.1 on the
nodes running sgeexecd without problems, all very stable. (But be aware
of bug http://gridengine.sunsource.net/issues/show_bug.cgi?id=3194,
sge_sheperd segfault on openSUSE 11.2 !)



On 04/27/2010 06:06 PM, futurity wrote:
> Hi rems0,
>
> Thank you for your quick reply.
>
> We're really happy with openSuse10.3 as we've found it to be bug free
> and very stable.  We've had some issues with openSuse11.0 and 11.1 which
> is why we're still using openSuse10.3. As these servers aren't on the
> internet and we have our own local copy of the update repository we've
> found it to be a very nice OS, but I agree there are newer perhaps
> better Linux distros out there.
>
> openSuse10.3 worked perfectly for grid engine 61u3 although this doesn't
> automatically mean it'll work for 62u5.  I thought openSuse10.3 met the
> requirements for grid engine 6.2u5.  Is there a known working Linux
> distribution that is recommended?
>
> Running the following gives:
>
> ldd /rmt/sge62/bin/lx24-x86/sge_execd
>
>         linux-gate.so.1 =>  (0xffffe000)
>         libdl.so.2 => /lib/libdl.so.2 (0xb7f80000)
>         libm.so.6 => /lib/libm.so.6 (0xb7f5b000)
>         libpthread.so.0 => /lib/libpthread.so.0 (0xb7f44000)
>         libcore.so =>
> /rmt/sge62/bin/lx24-x86/../../lib/lx24-x86/libcore.so (0xb7f42000)
>         libc.so.6 => /lib/libc.so.6 (0xb7e0f000)
>         /lib/ld-linux.so.2 (0xb7fa1000)
>
> Unfortunately I don't understand this output.  Is it ok?
>
> sge_execd doesn't appear to have logged anything to /var/log/messages :(
>
> Many thanks for your help.
>
> Neil
>
> On 27 April 2010 16:49, rems0 <Richard.Ems at cape-horn-eng.com<mailto:Richard.Ems at cape-horn-eng.com>
> <mailto:Richard.Ems at cape-horn-eng.com<mailto:Richard.Ems at cape-horn-eng.com>>> wrote:
>
>     Hi Neil,
>
>     openSUSE 10.3 is really old and has been discontinued October 31st 2009,
>     see
>     http://en.opensuse.org/SUSE_Linux_Lifetime#Discontinued_Distributions.
>
>
>     What does " ldd /rmt/sge62/bin/lx24-x86/sge_execd " report ?
>     Any message in /var/log/messages ?
>
>     Richard
>
>     On 04/27/2010 05:29 PM, futurity wrote:
>     > Hi,
>     >
>     > I'm in the process of installing a new grid with the aim of migrating
>     > machines from our 61 grid to 62u5.
>     >
>     > Unfortunately the sge_execd process doesn't seem to start on our
>     > execution host machines.
>     >
>     > The qmaster installed without any problems (on openSuse 10.3
>     32bit) and
>     > when started using "/etc/init.d/sgemaster.p6444 start" the process
>     works
>     > fine.  qstat, qhost etc all work fine.
>     >
>     > The sge_execd installed without any problems (again on openSuse 10.3
>     > 32bit) and when started using "/etc/init.d/sgeexecd.p6444 start"
>     it says
>     > it started, but the process just isn't running.  qhost lists the new
>     > execution host, but with dashes against the new host (not the
>     details as
>     > expected).
>     >
>     > I've even tried running "/rmt/sge62/bin/lx24-x86/sge_execd" as user
>     > sgeadmin62 (with the correct environment) and no errors are reported,
>     > but again the process isn't running.
>     >
>     > The only non default value used during the sge_execd install was the
>     > spool directory for which I entered "/local".  I had previously made a
>     > directory "/local" on the local disk and chmod'ed it to 777 (still
>     owned
>     > by root).  Again it said this was fine, but sge_execd didn't actually
>     > make any sub directories or log any messages to files within it
>     (during
>     > the install stage or while being run).
>     >
>     > Any idea what could be going on?  Is there a way to turn on any debug
>     > for sge_execd so I can see what's going on?
>     >
>     > Kind Regards
>     >
>     > Neil
>
>
>     --
>     Richard Ems       mail: Richard.Ems at Cape-Horn-Eng.com
>
>     Cape Horn Engineering S.L.
>     C/ Dr. J.J. Dómine 1, 5? piso
>     46011 Valencia
>     Tel : +34 96 3242923 / Fax 924
>     http://www.cape-horn-eng.com
>
>     ------------------------------------------------------
>     http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=255144
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=255144>
>
>     To unsubscribe from this discussion, e-mail:
>     [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>].
>
>


--
Richard Ems       mail: Richard.Ems at Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5? piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=255149

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].




More information about the gridengine-users mailing list