[GE users] sge stopped: error: getting configuration:

Patrice Hamelin phamelin at clumeq.mcgill.ca
Tue May 3 15:07:17 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

[root at stokes qmaster]# ulimit -a
core file size        (blocks, -c) 0
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) unlimited
max memory size       (kbytes, -m) unlimited
open files                    (-n) 1024
pipe size          (512 bytes, -p) 8
stack size            (kbytes, -s) 8192
cpu time             (seconds, -t) unlimited
max user processes            (-u) 7168
virtual memory        (kbytes, -v) unlimited


[root at stokes qmaster]# ps -ef | grep sge
root      4630  1540  0 10:07 pts/0    00:00:00 grep sge
[root at stokes qmaster]# /opt/sge/default/common/sgemaster
    starting sge_qmaster
    starting sge_schedd
error: getting configuration: unable to contact qmaster using port 536 
on host "stokes.clumeq.mcgill.ca"
can't get configuration from qmaster -- waiting ...
[root at stokes qmaster]# ps -ef | grep sge
root      4702  1540  0 10:08 pts/0    00:00:00 grep sge
[root at stokes qmaster]#


McCalla, Mac wrote:
> make sure there is not a schedd process left running on your server when
> you
> try to restart with the sgemaster command.  also, what is the result of
> a ulimit -a
> command issued just prior to issueing the sgemaster start
> command?....mac 
> 
> -----Original Message-----
> From: Daniel Templeton [mailto:Dan.Templeton at Sun.COM] 
> Sent: Tuesday, May 03, 2005 8:55 AM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] sge stopped: error: getting configuration:
> 
> Looks to me like running out of memory caused the qmaster to leave the 
> cluster in a broken state.  You'll need to clean up whatever the qmaster
> 
> left behind.  That may involve using utilbin/spooledit or deleting jobs 
> from the spool directory.  Unfortunately, you'll need the advice of 
> someone who actually recovers broken clusters instead of just 
> reinstalling them like I do.  Joachim?  Stephan?  Omar?
> 
> Daniel
> 
> Patrice Hamelin wrote:
> 
> 
>>After running sgemaster, qmaster is NOT running.  I run SGE 6.0u1 on 
>>RedHat linux 7.3.  see my other message, I had a memory problem which
> 
> I 
> 
>>think cause the  problem yesterday.
>>
>>Thanks guys for help!
>>
>>Daniel Templeton wrote:
>>
>>
>>>After running sgemaster, is your qmaster running?  What platform and 
>>>SGE version?  Was there an event which caused the qmaster to stop?
>>>
>>>Daniel
>>>
>>>Patrice Hamelin wrote:
>>>
>>>
>>>>Hi,
>>>>
>>>>  My qmaster stopped since a couple of hours and I cannot restart
> 
> it.
> 
>>>>I always have:
>>>>
>>>>[root at stokes common]#  /etc/init.d/sgemaster start
>>>>   starting sge_qmaster
>>>>   starting sge_schedd
>>>>error: getting configuration: unable to contact qmaster using port
> 
> 536
> 
>>>>on host "stokes.clumeq.mcgill.ca"
>>>>can't get configuration from qmaster -- waiting ...
>>>>
>>>>
>>>>  thanks for help!
>>>
>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

-- 
Patrice Hamelin ing, M.Sc.A, CCNA
Systems Administrator
CLUMEQ Supercomputer Centre
McGill University
688 Sherbrooke Street West, Suite 710
Montreal, QC, Canada H3A 2S6
Tel: 514-398-3344
Fax: 514-398-2203
http://www.clumeq.mcgill.ca

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list