[GE users] GE 6.2 installation on Linux: sge_qmaster didn't start

Lin Shao shao at msg.ucsf.edu
Fri Nov 21 00:46:13 GMT 2008


Hi,

I tried many times to install GE 6.2 on my Linux system (Debian etch,
x86_64, kernel 2.6.18-6) (by executing ./install_qmaster in $SGE_ROOT
as root and then using a non-root admin account who owns $SGE_ROOT
folder when prompted by the script), but each time the script gets to
the point where it tries to start the qmaster daemon, it fails with
the messages:
"""

Grid Engine qmaster startup
---------------------------

Starting qmaster daemon. Please wait ...
   starting sge_qmaster

sge_qmaster didn't start!
Please check the messages file

Hit <RETURN> to continue >>

"""

and later these error messages indicate sge_qmaster is not running:
"""

Creating the default <all.q> queue and <allhosts> hostgroup
-----------------------------------------------------------

error: commlib error: can't connect to service (Connection refused)
unable to contact qmaster using port 6444 on host "macaca.ucsf.edu"

Command failed: ./bin/lx24-amd64/qconf -Ahgrp /tmp/hostqueue22557

Probably a permission problem. Please check file access permissions.
Check root read/write permission. Check if SGE daemons are running.

"""


However, $SGE_ROOT/default/spool/qmaster/messages says:

11/20/2008 16:35:08|  main|macaca|I|read job database with 0 entries
in 0 seconds
11/20/2008 16:35:08|  main|macaca|E|error opening file
"/jws31/shao/sge_root/default/common/./sched_configuration" for
reading: No such file or directory
11/20/2008 16:35:08|  main|macaca|E|error opening file
"/jws31/shao/sge_root/default/spool/qmaster/./sharetree" for reading:
No such file or directory
11/20/2008 16:35:08|  main|macaca|I|qmaster hard descriptor limit is set to 8192
11/20/2008 16:35:08|  main|macaca|I|qmaster soft descriptor limit is set to 8192
11/20/2008 16:35:08|  main|macaca|I|qmaster will use max. 8172 file
descriptors for communication
11/20/2008 16:35:08|  main|macaca|I|qmaster will accept max. 99
dynamic event clients
11/20/2008 16:35:08|  main|macaca|I|starting up GE 6.2 (lx24-amd64)
11/20/2008 16:35:08|  main|macaca|W|can't open job sequence number
file "jobseqnum": for reading: No such file or directory -- guessing
next number
11/20/2008 16:35:08|  main|macaca|W|can't open ar sequence number file
"arseqnum": for reading: No such file or directory -- guessing next
number

And there is a process sge_master running under the admin user:

  PID TTY          TIME CMD
23081 ?        00:00:00 sge_qmaster


Any idea what's going on?

I have these lines in my /etc/services:

sge_commd       536/tcp                         # Grid Engine service
sge_qmaster     6444/tcp                        # Grid Engine Qmaster Service
sge_execd       6445/tcp                        # Grid Engine Execution Service


Thank you very much!

-lin

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89289

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list