[GE users] qmaster always die

sgexav xaviercouvelard at gmail.com
Thu Nov 19 16:16:39 GMT 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Ok comming back to the front

in debug mode qmaster says:

  1828  17483 scheduler000     ================[SCHEDULING-EPOCH 
200911191612.03]==================
  1829  17483 scheduler000     RAW CQ:1, J:4, H:21, C:47, A:2, D:1, P:0, 
CKPT:0, US:4, PR:0, RQS:0, AR:0, S:nd:0/lf:0
  1830  17483     timer000     load_report_interval: load value timeout 
for host compute-0-7.local is 40
  1831  17483     timer000     load_report_interval: load value timeout 
for host compute-0-1.local is 40
  1832  17483     timer000     load_report_interval: load value timeout 
for host compute-0-17.local is 40
  1833  17483     timer000     load_report_interval: load value timeout 
for host compute-0-14.local is 40
  1834  17483     timer000     load_report_interval: load value timeout 
for host compute-0-4.local is 40
/etc/init.d/sgemaster.nautilus: line 615: 17483 Segmentation fault      
$bin_dir/sge_qmaster

does it means something for someone?

cheers
Xavier
reuti a écrit :
> Hi,
>
> found this: http://www.ax86.net/2008/06/22/linux-segmentation-fault- 
> fehlercodes
>
> can it be faulty hardware or file system? AFAIK ROCKS delivers their  
> own compiled versions of SGE. Could you try to download the binaries  
> directly from sunsource.net and check whether these are behaving  
> different, i.e. fine?
>
> -- Reuti
>
>
> Am 30.10.2009 um 17:53 schrieb sgexav:
>
>   
>> finally a get this error messages (dmesg) any one know what error 4  
>> is?
>>
>> sge_qmaster[18202]: segfault at 0000000000000080 rip 00000036ed078d80
>> rsp 000000004883c888 error 4
>> sge_qmaster[18413]: segfault at 0000000000000080 rip 00000036ed078d80
>> rsp 00000000484ce888 error 4
>> sge_qmaster[18557]: segfault at 0000000000000080 rip 00000036ed078d80
>> rsp 0000000047d79888 error 4
>> sge_qmaster[10616]: segfault at 0000000000000080 rip 00000036ed078d80
>> rsp 00000000483c7888 error 4
>> sge_qmaster[10769]: segfault at 0000000000000080 rip 00000036ed078d80
>> rsp 0000000047f00888 error 4
>> sge_qmaster[10910]: segfault at 0000000000000080 rip 00000036ed078d80
>> rsp 0000000048e45888 error 4
>> sge_qmaster[15373]: segfault at 0000000000000080 rip 00000036ed078d80
>> rsp 00000000481e1888 error 4
>> sge_qmaster[19629]: segfault at 0000000000000080 rip 00000036ed078d80
>> rsp 0000000047f8b888 error 4
>> sge_qmaster[25172]: segfault at 0000000000000080 rip 00000036ed078d80
>> rsp 0000000048331888 error 4
>> sge_qmaster[25545]: segfault at 0000000000000080 rip 00000036ed078d80
>> rsp 0000000047f60888 error 4
>> sge_qmaster[1132]: segfault at 0000000000000080 rip  
>> 00000036ed078d80 rsp
>> 0000000047a01888 error 4
>>
>> could it be related to time synchronisation issues
>>
>> Xav
>> reuti a écrit :
>>     
>>> Am 27.10.2009 um 13:20 schrieb sgexav:
>>>
>>>
>>>       
>>>> Hi,
>>>> i don't know about JSV, so i supposed i do not use it.
>>>> It was working perfectly, and start craching after we remove and
>>>> reinstall the nodes,
>>>>
>>>>         
>>> All machines run the same version of SGE still?
>>>
>>> -- Reuti
>>>
>>>
>>>
>>>       
>>>> witout reinstalling frontend.
>>>> I am looking for usefull log messages, will let you know
>>>>
>>>> thanks
>>>> Xavier
>>>>
>>>> stephendennis a écrit :
>>>>
>>>>         
>>>>> Hello Xavier
>>>>>
>>>>> Are you using a JSV?  There is a bug in the jsv processing
>>>>> which can cause a qmaster crash.
>>>>>
>>>>> Stephen
>>>>>
>>>>> ________________________________________
>>>>> From: sgexav [xaviercouvelard at gmail.com]
>>>>> Sent: Tuesday, October 27, 2009 7:59 AM
>>>>> To: users at gridengine.sunsource.net
>>>>> Subject: [GE users] qmaster always die
>>>>>
>>>>> Hi all,
>>>>> Sorry to repost this message, but i didn't manage to solve it and
>>>>> do not
>>>>> even now where to find useful log message.
>>>>> I am using sge 6.2u2 coming with ROCKS5.2
>>>>> And sge_qmaster always die, sometimes after few hours but most of
>>>>> time
>>>>> every 3 minutes, which very not convenient.
>>>>>
>>>>> Would be very nice if someone can help to solve it.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Xavier
>>>>>
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>> dsForumId=38&dsMessageId=223543
>>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------- 
>>>>> --
>>>>>
>>>>>
>>>>> Notice from Univa UD Postmaster:
>>>>>
>>>>>
>>>>> This email message is for the sole use of the intended recipient
>>>>> (s) and may contain confidential and privileged information. Any
>>>>> unauthorized review, use, disclosure or distribution is
>>>>> prohibited. If you are not the intended recipient, please contact
>>>>> the sender by reply email and destroy all copies of the original
>>>>> message. This message has been content scanned by the Univa UD
>>>>> Tumbleweed MailGate.
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------- 
>>>>> --
>>>>>
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>> dsForumId=38&dsMessageId=223545
>>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>
>>>>>
>>>>>           
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>> dsForumId=38&dsMessageId=223548
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-
>>>> unsubscribe at gridengine.sunsource.net].
>>>>
>>>>
>>>>         
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>> dsForumId=38&dsMessageId=223551
>>>
>>> To unsubscribe from this discussion, e-mail: [users- 
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>>       
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=224203
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>>
>>     
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=224475
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=228011

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list