[GE users] installing exec host hangs at start of sge_commd

Andy Schwierskott andy.schwierskott at sun.com
Tue Jul 6 13:51:05 BST 2004


Hi,

> I did that after the hang to see which of the three processes (execd,
> commd and install script) was the one causing the problem but all three
> were in a wait state. At one point I tried to trace the startup of the
> execd daemon and that gave info that I could have examined but
> installing
> the execd on the qmaster seemed to jump me past that problem and I never
> went back to it. I could try putting an strace on the script execution
> and use the -f to follow forks.
>
> Bottom line - is it preferable to install with a SGE admin type of
> install
> or an install as root?

This is not related to your install problems. In general you should select
an admin user during qmaster installation.

Is the commd hanging or the execd. Try to start commd manually (set
SGE_ROOT, COMMD_PORT (if applicable) and SGE_CELL(if applicable).

If commd starts then please start execd. It should not try to start a commd.

Is it sure your problem is not related to a /etc/hosts file where the
hostname of the machine as aliased to the loopback address 127.0.0.1?

Andy


>
> Don
>
> -----Original Message-----
>
> The best to debug this is to use "strace -p <pid>" to find out where it
> is hanging.
>
>  -Ron
>
> --- Don Shesnicky <dshesnicky at enqsemi.com> wrote:
> > I'm installed 5.3p6 on redhat with the master on a stock 7.2 host with
>
> > the 2.4.7-10smp kernel.
> > Hardware is dual cpu Supermicros with Xeon processors. I ran the
> > qmaster install as root with an adminstrative admin of sgeadmin.
> > Directory is an nfs mount of /tools/sge/default with the leaf being a
> > link to the actual
> > /tools/sge/5.3p6 directory, therefore the default cell would be
> > /tools/sge/default/default.
> >
> > Now I'm trying to install the first exec host as root and it's hanging
>
> > starting the daemons:
> >
> >    Grid Engine execution daemon startup
> >    ------------------------------------
> >    Starting execution daemon daemon. Please wait ...
> >       starting sge_execd
> >    starting program:
> > /tools/sge/default/bin/glinux/sge_commd
> >    using service "sge_commd"
> >    bound to port 536
> >    <hangs forever>
> >
> > If I do a ps -efww I do see both the execd and commd running.
> >
> > One note is that I didn't see in the instructions that I had to make
> > the qmaster and exec host as well. I ran that install after trying to
> > debug the above for a bit. Also, root does not have permissions to
> > /tools/sge.
> >
> > Any direction would be appreciated.
> >
> > Don
> >
>
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Mail - 50x more storage than other providers!
> http://promotions.yahoo.com/new_mail
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>


Regards,
Mit freundlichen Gruessen,
Andy
Schwierskott

--
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Andy Schwierskott           Tel:     +49 941 3075-200  (x60200)
N1 Grid Engine Engineering  Support: +49 941 3075-250  (x60250)
Sun Microsystems GmbH       Fax:     +49 941 3075-222  (x60222)
Dr.-Leo-Ritter-Str. 7       mailto:andy.schwierskott at sun.com
D-93049 Regensburg          http://www.sun.com/gridware

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list