[GE users] Problems installing 6.0u1

Christophe Dupre duprec at scorec.rpi.edu
Wed Oct 6 17:27:57 BST 2004


I've double-checked my configuration. my hostname matches the IP address
of one of the ethernet interfaces, and the reverse mapping of that IP
address gives the machine's hostname.
I still get the same error while starting sge, but now I noticed a
corefile in /tmp:
bash-2.05b# file core.8264
core.8264: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV),
SVR4-style, from 'sge_qmaster'

and a gdb backtrace on it gives:
#0  0x3de88ef1 in pthread_cancel () from /lib/tls/libpthread.so.0
#1  0x0815bc3f in cl_thread_shutdown ()
#2  0x0815c276 in cl_thread_list_create_thread ()
#3  0x08147716 in cl_com_create_handle ()
#4  0x080d6544 in prepare_enroll ()
#5  0x0806456e in main ()
#6  0x3dd5e768 in __libc_start_main () from /lib/tls/libc.so.6

I'm not sure what my next move should be.

On Tue, 5 Oct 2004, Chris Dagdigian wrote:

> These sorts of errors are almost always caused by configuration problems
> relating to hostnames, hostname resolution and DNS configuration. SGE is
> very very sensitive to these sorts of things.
>
> In particular your server named "master.medusa.scorec.rpi.edu" should be
> listed in your campus DNS server for both forward and reverse queries.
>
> The hostname of the acting qmaster is going to be written to your
> $SGE_ROOT/<cell>/common/act_qmaster
>
> Your compute nodes on the private network will read that file and try to
> contact the hostname listed. If that hostname is the public name and is
> unreachable via the private network you are going to have issues. The
> fix for that situation is to create a file in
> $SGE_ROOT/<cell>/host_aliases that has an entry for
> "master.medusa.scorec.rpi.edu <internal-hostnam>" -- this will let the
> private nodes contact the correct IP address.
>
>
> If this problem is not caused by hostname/DNS issues then it could also
> be something simple like a firewall blocking your TCP port etc.
>
> -Chris
>
>
>
>
>
> Christophe Dupre wrote:
>
> > I am trying to install 6.0u1 on a cluster running RHEL 3.0. The master
> > node is attached to a private network with the compute nodes, and a public
> > network for end-users access.
> > When I try to start the daemons:
> > bash-2.05b# /etc/init.d/sgemaster  start
> >    starting sge_qmaster
> >
> > sge_qmaster didn't start!
> > Please check the messages file
> >
> >    starting sge_schedd
> > error: getting configuration: unable to contact qmaster using port 536 on
> > host "master.medusa.scorec.rpi.edu"
> > can't get configuration from qmaster -- waiting ...
> > can't get configuration from qmaster -- waiting ...
> > can't get configuration from qmaster -- waiting ...
> > error: can't get configuration from qmaster -- backgrounding
> >
> > but the messages file is empty.
> >
> > I ran the install_qmaster script and followed the instructions, but the
> > script failed while trying to start the qmaster daemon.
> >
> >
> >
> >
> > --
> > Christophe Dupre
> > System Administrator, Scientific Computation Research Center
> > Rensselaer Polytechnic Institute
> > Troy, NY        USA
> > Phone: (518) 276-2578  -  Fax: (518) 276-4886
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> --
> Chris Dagdigian, <dag at sonsorol.org>
> BioTeam  - Independent life science IT & informatics consulting
> Office: 617-665-6088, Mobile: 617-877-5498, Fax: 425-699-0193
> PGP KeyID: 83D4310E iChat/AIM: bioteamdag  Web: http://bioteam.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>


--
Christophe Dupre
System Administrator, Scientific Computation Research Center
Rensselaer Polytechnic Institute
Troy, NY        USA
Phone: (518) 276-2578  -  Fax: (518) 276-4886

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list