[GE users] sge_execd says it starts but it doesn't start
neil at futurity.co.uk
Tue Apr 27 16:29:44 BST 2010
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
I'm in the process of installing a new grid with the aim of migrating machines from our 61 grid to 62u5.
Unfortunately the sge_execd process doesn't seem to start on our execution host machines.
The qmaster installed without any problems (on openSuse 10.3 32bit) and when started using "/etc/init.d/sgemaster.p6444 start" the process works fine. qstat, qhost etc all work fine.
The sge_execd installed without any problems (again on openSuse 10.3 32bit) and when started using "/etc/init.d/sgeexecd.p6444 start" it says it started, but the process just isn't running. qhost lists the new execution host, but with dashes against the new host (not the details as expected).
I've even tried running "/rmt/sge62/bin/lx24-x86/sge_execd" as user sgeadmin62 (with the correct environment) and no errors are reported, but again the process isn't running.
The only non default value used during the sge_execd install was the spool directory for which I entered "/local". I had previously made a directory "/local" on the local disk and chmod'ed it to 777 (still owned by root). Again it said this was fine, but sge_execd didn't actually make any sub directories or log any messages to files within it (during the install stage or while being run).
Any idea what could be going on? Is there a way to turn on any debug for sge_execd so I can see what's going on?
More information about the gridengine-users