[GE users] GE 6.2 installation on Linux: sge_qmaster didn't start

reuti reuti at staff.uni-marburg.de
Fri Nov 21 10:02:17 GMT 2008


Hi,

Am 21.11.2008 um 01:46 schrieb Lin Shao:

> Hi,
>
> I tried many times to install GE 6.2 on my Linux system (Debian etch,
> x86_64, kernel 2.6.18-6) (by executing ./install_qmaster in $SGE_ROOT
> as root

this is the corret way. Just the owner of your $SGE_ROOT (often it's / 
usr/sge) should be your designated SGE admin user.

- Kill the running sge_qmaster before proceeding
- Remove the $SGE_ROOT/default
- What path did you enter for the spooling directories?
- Is $SGE_ROOT local? With NFS and root_squash it might not work.

-- Reuti

> and then using a non-root admin account who owns $SGE_ROOT
> folder when prompted by the script), but each time the script gets to
> the point where it tries to start the qmaster daemon, it fails with
> the messages:
> """
>
> Grid Engine qmaster startup
> ---------------------------
>
> Starting qmaster daemon. Please wait ...
>    starting sge_qmaster
>
> sge_qmaster didn't start!
> Please check the messages file
>
> Hit <RETURN> to continue >>
>
> """
>
> and later these error messages indicate sge_qmaster is not running:
> """
>
> Creating the default <all.q> queue and <allhosts> hostgroup
> -----------------------------------------------------------
>
> error: commlib error: can't connect to service (Connection refused)
> unable to contact qmaster using port 6444 on host "macaca.ucsf.edu"
>
> Command failed: ./bin/lx24-amd64/qconf -Ahgrp /tmp/hostqueue22557
>
> Probably a permission problem. Please check file access permissions.
> Check root read/write permission. Check if SGE daemons are running.
>
> """
>
>
> However, $SGE_ROOT/default/spool/qmaster/messages says:
>
> 11/20/2008 16:35:08|  main|macaca|I|read job database with 0 entries
> in 0 seconds
> 11/20/2008 16:35:08|  main|macaca|E|error opening file
> "/jws31/shao/sge_root/default/common/./sched_configuration" for
> reading: No such file or directory
> 11/20/2008 16:35:08|  main|macaca|E|error opening file
> "/jws31/shao/sge_root/default/spool/qmaster/./sharetree" for reading:
> No such file or directory
> 11/20/2008 16:35:08|  main|macaca|I|qmaster hard descriptor limit  
> is set to 8192
> 11/20/2008 16:35:08|  main|macaca|I|qmaster soft descriptor limit  
> is set to 8192
> 11/20/2008 16:35:08|  main|macaca|I|qmaster will use max. 8172 file
> descriptors for communication
> 11/20/2008 16:35:08|  main|macaca|I|qmaster will accept max. 99
> dynamic event clients
> 11/20/2008 16:35:08|  main|macaca|I|starting up GE 6.2 (lx24-amd64)
> 11/20/2008 16:35:08|  main|macaca|W|can't open job sequence number
> file "jobseqnum": for reading: No such file or directory -- guessing
> next number
> 11/20/2008 16:35:08|  main|macaca|W|can't open ar sequence number file
> "arseqnum": for reading: No such file or directory -- guessing next
> number
>
> And there is a process sge_master running under the admin user:
>
>   PID TTY          TIME CMD
> 23081 ?        00:00:00 sge_qmaster
>
>
> Any idea what's going on?
>
> I have these lines in my /etc/services:
>
> sge_commd       536/tcp                         # Grid Engine service
> sge_qmaster     6444/tcp                        # Grid Engine  
> Qmaster Service
> sge_execd       6445/tcp                        # Grid Engine  
> Execution Service
>
>
> Thank you very much!
>
> -lin
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=89289
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89317

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list