[GE users] comlib errors with qmaster installation

Ken Tang kentang at berkeley.edu
Fri Feb 23 06:36:21 GMT 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]



Jeevan wrote:
> Hi Ken,
> Is there anything you did to ubuntu to get it working ?
> Like changing the iptables config etc?
>
> I have added the entries in the /etc/services file. I still have no luck.
> Regards,
> Jeevan
>
>
>
> Ken Tang wrote:
> > Jeevan Kumar Ramakrishna wrote:
> >   
> >> Hi all,
> >> I am trying to install the Grid Engine 6 u9 on my ubuntu 6.06 /i586 machine. The errors I get during the installation are.
> >> <snip>
> >> Grid Engine qmaster and scheduler startup
> >> -----------------------------------------
> >>
> >> Starting qmaster and scheduler daemon. Please wait ...
> >>    starting sge_qmaster
> >>
> >> sge_qmaster didn't start!
> >> Please check the messages file
> >>
> >>    starting sge_schedd
> >> error: commlib error: can't connect to service (Connection refused)
> >> error: getting configuration: unable to contact qmaster using port 536 on host "univ-iec-laptop"
> >> error: can't get configuration from qmaster -- backgrounding
> >> </snip>
> >>
> >>
> >> Here is the contents of the messages file.
> >> <snip>
> >> 02/22/2007 06:57:23|qmaster|univ-iec-laptop|I|read job database with 0 entries in 0 seconds
> >> 02/22/2007 06:57:23|qmaster|univ-iec-laptop|I|qmaster hard descriptor limit is set to 8192
> >> 02/22/2007 06:57:23|qmaster|univ-iec-laptop|I|qmaster soft descriptor limit is set to 8192
> >> 02/22/2007 06:57:23|qmaster|univ-iec-laptop|I|qmaster will use max. 8172 file descriptors for communication
> >> 02/22/2007 06:57:23|qmaster|univ-iec-laptop|I|qmaster will accept max. 99 dynamic event clients
> >> 02/22/2007 06:57:23|qmaster|univ-iec-laptop|I|starting up 6.0u9
> >> 02/22/2007 06:57:23|qmaster|univ-iec-laptop|W|can't open sequence number file "jobseqnum": for reading: No such file or directory -- guessing next job number
> >> </snip>
> >>
> >> I am trying to put together a demo for some univ student. 
> >> Any help is much appreciated.
> >> Regards,
> >> Jeevan
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>
> >>   
> >>     
> >
> > I have my SGE 6.9/6.10 running on Ubuntu 6.10 so I'm going to assume
> > that 6.06 is the same.
> >
> > Did you make sure to add port 536 and 537 to your /etc/services?  I
> > usually do this by appending to the /etc/services:
> >
> > sudo echo sge_qmaster 536/tcp >> /etc/services ; sudo echo sge_execd >>
> > /etc/services
> >
> >   
>
> --------------------------------------------------------------------- To 
> unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net For additional 
> commands, e-mail: users-help at gridengine.sunsource.net
>   
I will give you step-by-step instructions on how I got the master SGE
6.9/6.10 installed on my Ubuntu 6.10 system. 

First, I installed Ubuntu 6.10 and chose the COMMAND-LINE only
installation using the Ubuntu 6.10 alternate installation CD.  The
installation is really fast and minimal, and it's easy to do network
install if you choose to.
Once it's all done and installed:

1) I created a root account so I can work with root ownership and get
right into working with the system
2) download the SGE common files and the 6.10 binaries.
3) login as root account if you haven't already
4) aptitude install ssh (for remote management and connectivity - will
get packages via the installation media)
5) mkdir -p /opt/gridengine
6) untar the SGE common files and 6.10 binaries (or 6.9 binaries) to
/opt/gridengine
7) chmod -R root /opt/gridengine
8) export SGE_ROOT=/opt/gridengine
9) echo sge_qmaster 536/tcp >> /etc/services ; echo sge_execd 537/tcp >>
/etc/services
10) add the hostname and IP address of the local computer to the
/etc/hosts file
11) run the /opt/gridengine/util/setfileperm.sh $SGE_ROOT to set the
correct permissions (this might be redundant since we already set it in
step 7???)
12) run the install_qmaster shell script
- I pretty much then go through the script as normal, create a DEFAULT
cell name, choose a regular user to be the SGE admin, choose the CLASSIC
database instead of the BerkeleyDB, and then install the daemon during
boot up.
13) after it's done, run ps aux | grep sge to make sure the sge daemons
are running. 

That's pretty much the gist of getting the frontend computer installed. 
You can then move on to the other computers and run the install_execd
shell scripts. 

As you can see, I haven't even began to configure any other services
like NFS, NIS, iptables, SSH, or DNS, and I am able to get SGE up and
running and communicating with the nodes in the cluster.

Hope this helps in some way or another. 

-Ken

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list