[GE users] sge stopped: error: getting configuration:

Patrice Hamelin phamelin at clumeq.mcgill.ca
Tue May 3 15:02:15 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

[root at stokes lx24-x86]# /opt/sge/utilbin/lx24-x86/spooledit list
Segmentation fault


Interesting!!

Daniel Templeton wrote:
> Looks to me like running out of memory caused the qmaster to leave the 
> cluster in a broken state.  You'll need to clean up whatever the qmaster 
> left behind.  That may involve using utilbin/spooledit or deleting jobs 
> from the spool directory.  Unfortunately, you'll need the advice of 
> someone who actually recovers broken clusters instead of just 
> reinstalling them like I do.  Joachim?  Stephan?  Omar?
> 
> Daniel
> 
> Patrice Hamelin wrote:
> 
>> After running sgemaster, qmaster is NOT running.  I run SGE 6.0u1 on 
>> RedHat linux 7.3.  see my other message, I had a memory problem which 
>> I think cause the  problem yesterday.
>>
>> Thanks guys for help!
>>
>> Daniel Templeton wrote:
>>
>>> After running sgemaster, is your qmaster running?  What platform and 
>>> SGE version?  Was there an event which caused the qmaster to stop?
>>>
>>> Daniel
>>>
>>> Patrice Hamelin wrote:
>>>
>>>> Hi,
>>>>
>>>>   My qmaster stopped since a couple of hours and I cannot restart it.
>>>> I always have:
>>>>
>>>> [root at stokes common]#  /etc/init.d/sgemaster start
>>>>    starting sge_qmaster
>>>>    starting sge_schedd
>>>> error: getting configuration: unable to contact qmaster using port 536
>>>> on host "stokes.clumeq.mcgill.ca"
>>>> can't get configuration from qmaster -- waiting ...
>>>>
>>>>
>>>>   thanks for help!
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

-- 
Patrice Hamelin ing, M.Sc.A, CCNA
Systems Administrator
CLUMEQ Supercomputer Centre
McGill University
688 Sherbrooke Street West, Suite 710
Montreal, QC, Canada H3A 2S6
Tel: 514-398-3344
Fax: 514-398-2203
http://www.clumeq.mcgill.ca

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list