[GE users] sge stopped: error: getting configuration:

Joachim Gabler Joachim.Gabler at Sun.COM
Tue May 3 15:17:14 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Did qmaster write a core file?
Could you please try to produce a stack trace (using gdb)?

Except the jobs, all spooled files are ascii files.
I would suggest to delete the spooled jobs: Everything under 
<spooldir>/jobs and <spooldir>/job_scripts.

If qmaster still doesn't startup, have a look at the other spooled objects:
<spooldir>/admin_hosts,
<spooldir>/calendars,
....

Just do a cat on all files and verify "if they look ok".

  Joachim


Patrice Hamelin schrieb:

> I run the classic spooling
>
>
> [root at stokes util]# /opt/sge/bin/lx24-x86/sge_qmaster
> Reading in complex attributes.
> Reading in execution hosts.
> Reading in administrative hosts.
> Reading in submit hosts.
> Reading in host group entries:
>         Host group entries for group "@allhosts".
>         Host group entries for group "@single".
>         Host group entries for group "@multi".
>         Host group entries for group "@gal".
>         Host group entries for group "@bigmem".
>         Host group entries for group "@multi2".
> Reading in usersets:
>         Userset "gal".
>         Userset "defaultdepartment".
>         Userset "deadlineusers".
>         Userset "admin".
> Reading in queues:
>         Queue "batch".
>         Queue "multi".
>         Queue "single".
>         Queue "bigmem".
>         Queue "multi2".
> Reading in parallel environments:
>         PE "mpich_1".
>         PE "mpich_2".
> Reading in ckpt interface definitions:
>         CKPT "blcr".
> Reading in Master_Job_List.
> .
>
> read job database with 15 entries in 1 seconds
> Segmentation fault
> [root at stokes util]#
>
>
> Joachim Gabler wrote:
>
>> Patrice,
>>
>> what spooling method are you using (classic / berkeleydb)?
>>
>> Please try to startup qmaster in debug mode:
>> In a shell as user root:
>> source $SGE_ROOT/util/dl.(c)sh
>> dl 1
>> $SGE_ROOT/bin/<arch>/sge_qmaster
>>
>> This might show some error messages, e.g. when reading jobs from disk.
>>
>>   Joachim
>>
>> Daniel Templeton schrieb:
>>
>>> Looks to me like running out of memory caused the qmaster to leave 
>>> the cluster in a broken state.  You'll need to clean up whatever the 
>>> qmaster left behind.  That may involve using utilbin/spooledit or 
>>> deleting jobs from the spool directory.  Unfortunately, you'll need 
>>> the advice of someone who actually recovers broken clusters instead 
>>> of just reinstalling them like I do.  Joachim?  Stephan?  Omar?
>>>
>>> Daniel
>>>
>>> Patrice Hamelin wrote:
>>>
>>>> After running sgemaster, qmaster is NOT running.  I run SGE 6.0u1 
>>>> on RedHat linux 7.3.  see my other message, I had a memory problem 
>>>> which I think cause the  problem yesterday.
>>>>
>>>> Thanks guys for help!
>>>>
>>>> Daniel Templeton wrote:
>>>>
>>>>> After running sgemaster, is your qmaster running?  What platform 
>>>>> and SGE version?  Was there an event which caused the qmaster to 
>>>>> stop?
>>>>>
>>>>> Daniel
>>>>>
>>>>> Patrice Hamelin wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>   My qmaster stopped since a couple of hours and I cannot restart 
>>>>>> it.
>>>>>> I always have:
>>>>>>
>>>>>> [root at stokes common]#  /etc/init.d/sgemaster start
>>>>>>    starting sge_qmaster
>>>>>>    starting sge_schedd
>>>>>> error: getting configuration: unable to contact qmaster using 
>>>>>> port 536
>>>>>> on host "stokes.clumeq.mcgill.ca"
>>>>>> can't get configuration from qmaster -- waiting ...
>>>>>>
>>>>>>
>>>>>>   thanks for help!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list