[GE users] sge_execd says it starts but it doesn't start
neil at futurity.co.uk
Tue Apr 27 17:54:42 BST 2010
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
Found the problem thanks to your help :)
Nothing in $SGE_ROOT/$SGE_CELL/spool/`hostname`/messages as a local spool directory was configured.
However there were some message files in /tmp. I have to confess that I never through to look there.
The message said:
04/27/2010 15:32:03| main|stg-zoom1|C|can't create directory "stg-zoom1": No such file or directory
Assuming that it needs to create this local spool directory, it was either a permission problem or the parent directory wasn't there.
It turns out that I needed to create the parent directories right down to the spool directory. Doing the following as root solved the problem:
mkdir -p /local/sge/spool
chown -R sgeadmin62:sgeadmin62 /local
where "/local" is the local spool directory I specified and "sgeadmin62" is the grid engine admin user.
It all starts perfectly now. Thank you so much for your help. I hope this solution helps someone else in the future.
On 27 April 2010 17:20, rems0 <Richard.Ems at cape-horn-eng.com<mailto:Richard.Ems at cape-horn-eng.com>> wrote:
the ldd output seems ok to me.
there is also a log file created by sgeexecd itself,
$SGE_ROOT/$SGE_CELL/spool/`hostname`/messages , anything there?
Or in the /tmp directory?
We use openSUSE 11.2 on the SGE server and still openSUSE 11.1 on the
nodes running sgeexecd without problems, all very stable. (But be aware
of bug http://gridengine.sunsource.net/issues/show_bug.cgi?id=3194,
sge_sheperd segfault on openSUSE 11.2 !)
On 04/27/2010 06:06 PM, futurity wrote:
> Hi rems0,
> Thank you for your quick reply.
> We're really happy with openSuse10.3 as we've found it to be bug free
> and very stable. We've had some issues with openSuse11.0 and 11.1 which
> is why we're still using openSuse10.3. As these servers aren't on the
> internet and we have our own local copy of the update repository we've
> found it to be a very nice OS, but I agree there are newer perhaps
> better Linux distros out there.
> openSuse10.3 worked perfectly for grid engine 61u3 although this doesn't
> automatically mean it'll work for 62u5. I thought openSuse10.3 met the
> requirements for grid engine 6.2u5. Is there a known working Linux
> distribution that is recommended?
> Running the following gives:
> ldd /rmt/sge62/bin/lx24-x86/sge_execd
> linux-gate.so.1 => (0xffffe000)
> libdl.so.2 => /lib/libdl.so.2 (0xb7f80000)
> libm.so.6 => /lib/libm.so.6 (0xb7f5b000)
> libpthread.so.0 => /lib/libpthread.so.0 (0xb7f44000)
> libcore.so =>
> /rmt/sge62/bin/lx24-x86/../../lib/lx24-x86/libcore.so (0xb7f42000)
> libc.so.6 => /lib/libc.so.6 (0xb7e0f000)
> /lib/ld-linux.so.2 (0xb7fa1000)
> Unfortunately I don't understand this output. Is it ok?
> sge_execd doesn't appear to have logged anything to /var/log/messages :(
> Many thanks for your help.
> On 27 April 2010 16:49, rems0 <Richard.Ems at cape-horn-eng.com<mailto:Richard.Ems at cape-horn-eng.com>
> <mailto:Richard.Ems at cape-horn-eng.com<mailto:Richard.Ems at cape-horn-eng.com>>> wrote:
> Hi Neil,
> openSUSE 10.3 is really old and has been discontinued October 31st 2009,
> What does " ldd /rmt/sge62/bin/lx24-x86/sge_execd " report ?
> Any message in /var/log/messages ?
> On 04/27/2010 05:29 PM, futurity wrote:
> > Hi,
> > I'm in the process of installing a new grid with the aim of migrating
> > machines from our 61 grid to 62u5.
> > Unfortunately the sge_execd process doesn't seem to start on our
> > execution host machines.
> > The qmaster installed without any problems (on openSuse 10.3
> 32bit) and
> > when started using "/etc/init.d/sgemaster.p6444 start" the process
> > fine. qstat, qhost etc all work fine.
> > The sge_execd installed without any problems (again on openSuse 10.3
> > 32bit) and when started using "/etc/init.d/sgeexecd.p6444 start"
> it says
> > it started, but the process just isn't running. qhost lists the new
> > execution host, but with dashes against the new host (not the
> details as
> > expected).
> > I've even tried running "/rmt/sge62/bin/lx24-x86/sge_execd" as user
> > sgeadmin62 (with the correct environment) and no errors are reported,
> > but again the process isn't running.
> > The only non default value used during the sge_execd install was the
> > spool directory for which I entered "/local". I had previously made a
> > directory "/local" on the local disk and chmod'ed it to 777 (still
> > by root). Again it said this was fine, but sge_execd didn't actually
> > make any sub directories or log any messages to files within it
> > the install stage or while being run).
> > Any idea what could be going on? Is there a way to turn on any debug
> > for sge_execd so I can see what's going on?
> > Kind Regards
> > Neil
> Richard Ems mail: Richard.Ems at Cape-Horn-Eng.com
> Cape Horn Engineering S.L.
> C/ Dr. J.J. Dómine 1, 5? piso
> 46011 Valencia
> Tel : +34 96 3242923 / Fax 924
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
> <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>].
Richard Ems mail: Richard.Ems at Cape-Horn-Eng.com
Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5? piso
Tel : +34 96 3242923 / Fax 924
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].
More information about the gridengine-users