[GE users] Segmentation fault to do qconf -su group

Esteban Freire Garcia esfreire at cesga.es
Mon Aug 14 13:28:38 BST 2006


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


Hi Andreas,

---------------------------------------------------------------------------
sc1> dbx qconf.out
dbx version 5.1
Type 'help' for help.

(dbx) run -su cesga
thread 0xb signal Segmentation fault at >*[strlen, 0x3ff800d2590]
ldq_u   t0, 0(a0)
(dbx) where
>  0 strlen(0x0, 0x40d6666666666667, 0x1200cb390, 0x0, 0x12007c3b4)
[0x3ff800d2590]
   1 sge_dstring_append(0x12007c3b4, 0x11fff94d0, 0x12007c444, 0x11fff94d0,
0x1) [0x120114f18]
   2 userset_get_type_string(0x47f, 0x0, 0x0, 0x1400be700, 0x11fffb680)
[0x12007c440]
   3 object_append_field_to_dstring(0x11fffb550, 0x140062fd0, 0x120052e00,
0x11fffb680, 0x47f) [0x120094224]
   4 (unknown)() [0x120052fdc]
   5 spool_flatfile_write_object(0x1400c6c00, 0xd, 0x400, 0x0, 0x0)
[0x120051fbc]
   6 sge_parse_qconf(0x12002cf34, 0x0, 0x14000dce8, 0x3ffc0080050, 0x1)
[0x12003a61c]
   7 main(0x1, 0x140065570, 0x12002cdcc, 0x0, 0x100000003) [0x12002cf74]
(dbx) run -mu cesga
thread 0x8 signal Segmentation fault at >*[strlen, 0x3ff800d2590]
ldq_u   t0, 0(a0)
(dbx) where
>  0 strlen(0x0, 0x40d6666666666667, 0x1200cb390, 0x0, 0x12007c3b4)
[0x3ff800d2590]
   1 sge_dstring_append(0x12007c3b4, 0x11fff9350, 0x12007c444, 0x11fff9350,
0x1) [0x120114f18]
   2 userset_get_type_string(0x47f, 0x0, 0x0, 0x1400be600, 0x11fffb508)
[0x12007c440]
   3 object_append_field_to_dstring(0x11fffb3d0, 0x140062fd0, 0x120052e00,
0x11fffb508, 0x47f) [0x120094224]
   4 (unknown)() [0x120052fdc]
   5 spool_flatfile_write_object(0x1400c6c00, 0xd, 0x400, 0x0, 0x200000000)
[0x120051fbc]
   6 (unknown)() [0x120042f18]
   7 sge_parse_qconf(0x12002cf34, 0x0, 0x14000dce8, 0x3ffc0080050, 0x1)
[0x1200355ac]
   8 main(0x1, 0x140065570, 0x12002cdcc, 0x0, 0x100000003) [0x12002cf74]
(dbx) quit
---------------------------------------------------------------------------

Thanks,
Esteban

> Hi Esteban,
>
> Good!
>
> Yet the output still tells me merely qconf crashed in strlen()
>
>    Segmentation fault at >*[strlen, 0x3ff800d2590]
>
> what I need for actually locating the problem is a full stack
> trace.
>
> You get a stack trace from dbx by entering the 'where' command
> after the Seg fault happend.
>
> Thanks,
> Andreas
>
> On Mon, 14 Aug 2006, Esteban Freire Garcia wrote:
>
>>
>> Hi,
>>
>> Sorry, I couldn't reply because I was on holidays, now I run qcon -su
>> | -mu under dbx and this is the result:
>> --------------------------------------------------------------------------
>> sc>dbx qconf.out
>> dbx version 5.1
>> Type 'help' for help.
>>
>> (dbx) run
>> SGE 6.0
>> usage: qconf [options]
>>   [-aattr obj_nm attr_nm val obj_id_lst]   add to a list attribute of
>>   an object   [-Aattr obj_nm fname obj_id_lst]         add to a list
>>   attribute of an object   [-acal calendar_name]
>>   add a new calendar [-Acal fname]                            add a
>>   new calendar from file [-ackpt ckpt_name]                       add
>>   a ckpt interface definition [-Ackpt fname]
>>   add a ckpt interface definition
>> from file
>>   [-aconf host_list]                       add configurations
>>   [-Aconf file_list]                       add configurations from
>>   file_list [-ae [exec_server_template]]             add an exec host
>>   using a template [-Ae fname]                              add an
>>   exec host from file [-ah hostname]                           add an
>>   administrative host [-ahgrp group]                           add new
>>   host group entry [-Ahgrp file]                            add new
>>   host group entry from file   [-am user_list]
>>    add user to manager list [-ao user_list]
>>   add user to operator list [-ap pe-name]
>>   add a new parallel environment [-Ap fname]
>>      add a new parallel environment
>> from file
>>   [-aprj]                                  add project
>>   [-Aprj fname]                            add project from file [-aq
>>   ]                                   add a new cluster queue [-Aq
>>   fname]                              add a queue from file [-as
>>   hostname]                           add a submit host
>>   [-astnode node_shares_list]              add sharetree node(s)
>>   [-astree]                                create/modify the sharetree
>>   [-Astree fname]                          create/modify the sharetree
>>   from
>> file
>>   [-au user_list listname_list]            add user(s) to userset
>>   list(s) [-Au fname]                              add userset from
>>   file [-auser]                                 add user
>>   [-Auser fname]                           add user from file
>>   [-clearusage]                            clear all user/project
>>   sharetree
>> usage
>>   [-cq destin_id_list]                     clean queue
>>   [-dattr obj_nm attr_nm val obj_id_lst]   delete from a list
>>   attribute of
>> an object
>>   [-Dattr obj_nm fname obj_id_lst]         delete from a list
>>   attribute of
>> an object
>>   [-dcal calendar_name]                    remove a calendar
>>   [-dckpt ckpt_name]                       remove a ckpt interface
>>   definition   [-dconf host_list]                       delete local
>>   configurations [-de host_list]                          remove an
>>   exec server [-dh host_list]                          remove an
>>   administrative host [-dhgrp group]                           delete
>>   host group entry [-dm user_list]                          remove
>>   user from manager list [-do user_list]
>>   remove user from operator list [-dp pe-name]
>>      remove a parallel environment [-dprj project_list]
>>        delete project
>>   [-dq destin_id_list]                     remove a queue
>>   [-ds host_list]                          remove submit host
>>   [-dstnode node_list]                     remove sharetree node(s)
>>   [-dstree]                                delete the sharetree
>>   [-du user_list listname_list]            remove user(s) from userset
>>   list(s)   [-dul listname_list]                     remove userset
>>   list(s) completely [-duser user_list]                       delete
>>   user
>>   [-help]                                  print this help
>>   [-ke[j] host_list                        shutdown execution
>>   daemon(s) [-k{m|s}]                                shutdown
>>   master|scheduling daemon [-kec evid_list]
>>   kill event client
>>   [-mattr obj_nm attr_nm val obj_id_lst]   modify an attribute (or
>>   element
>> in a sublist) of an object
>>   [-Mattr obj_nm fname obj_id_lst]         modify an attribute (or
>>   element
>> in a sublist) of an object
>>   [-mc ]                                   modify complex attributes
>>   [-mckpt ckpt_name]                       modify a ckpt interface
>>   definition   [-Mc fname]                              modify complex
>>   attributes from file   [-mcal calendar_name]
>>   modify calendar [-Mcal fname]                            modify
>>   calendar from file [-Mckpt fname]                           modify a
>>   ckpt interface
>> definition from file
>>   [-mconf [host_list|global]]              modify configurations
>>   [-msconf]                                modify scheduler
>>   configuration [-Msconf fname]                          modify
>>   scheduler configuration
>> from file
>>   [-me server]                             modify exec server
>>   [-Me fname]                              modify exec server from
>>   file [-mhgrp group]                           modify host group
>>   entry [-Mhgrp file]                            modify host group
>>   entry from file [-mp pe-name]                            modify a
>>   parallel environment [-Mp fname]                              modify
>>   a parallel environment
>> from file
>>   [-mprj project]                          modify a project
>>   [-Mprj fname]                            modify project from file
>>   [-mq queue]                              modify a queue
>>   [-Mq fname]                              modify a queue from file
>>   [-mstnode node_shares_list]              modify sharetree node(s)
>>   [-Mstree fname]                          modify/create the sharetree
>>   from
>> file
>>   [-mstree]                                modify/create the sharetree
>>   [-mu listname_list]                      modify the given userset
>>   list [-Mu fname]                              modify userset from
>>   file [-muser user]                            modify a user
>>   [-Muser fname]                           modify a user from file
>>   [-rattr obj_nm attr_nm val obj_id_lst]   replace a list attribute of
>>   an
>> object
>>   [-Rattr obj_nm fname obj_id_lst]         replace a list attribute of
>>   an
>> object
>>   [-sc ]                                   show complex attributes
>>   [-scal calendar_name]                    show given calendar
>>   [-scall]                                 show a list of all calendar
>>   names [-sckpt ckpt_name]                       show ckpt interface
>>   definition [-sckptl]                                show all ckpt
>>   interface definitions   [-sconf [host_list|global]]
>>   show configurations [-sconfl]                                show a
>>   list of all local
>> configurations
>>   [-se server]                             show given exec server
>>   [-secl]                                  show event client list
>>   [-sel]                                   show a list of all exec
>>   servers [-sep]                                   show a list of all
>>   licensed
>> processors
>>   [-sh]                                    show a list of all
>> administrative hosts
>>   [-shgrp group]                           show host group
>>   [-shgrp_tree group]                      show host group and used
>> hostgroups as tree
>>   [-shgrp_resolved group]                  show host group with
>>   resolved
>> hostlist
>>   [-shgrpl]                                show host group list
>>   [-sds]                                   show detached settings
>>   [-sm]                                    show a list of all managers
>>   [-so]                                    show a list of all
>>   operators [-sobjl obj_nm2 attr_nm val]             show objects
>>   which match the
>> given value
>>   [-sp pe-name]                            show a parallel environment
>>   [-spl]                                   show all parallel
>>   environments [-sprj project]                          show a project
>>   [-sprjl]                                 show a list of all projects
>>   [-sq [destin_id_list]]                   show the given queue
>>   [-sql]                                   show a list of all queues
>>   [-ss]                                    show a list of all submit
>>   hosts [-sss]                                   show scheduler state
>>   [-ssconf]                                show scheduler
>>   configuration [-sstnode node_list]                     show
>>   sharetree node(s) [-rsstnode node_list]                    show
>>   sharetree node(s) and its
>> children
>>   [-sstree]                                show the sharetree
>>   [-su listname_list]                      show the given userset list
>>   [-suser user_list]                       show user(s)
>>   [-sul]                                   show a list of all userset
>>   lists [-suserl]                                show a list of all
>>   users [-tsm]                                   trigger scheduler
>>   monitoring
>> complex_list            complex[,complex,...]
>> destin_id_list          queue[ queue ...]
>> listname_list           listname[,listname,...]
>> node_list               node_path[,node_path,...]
>> node_path               [/]node_name[[/.]node_name...]
>> node_shares_list        node_path=shares[,node_path=shares,...]
>> user_list               user|pattern[,user|pattern,...]
>> obj_nm                  "queue"|"exechost"|"pe"|"ckpt"|"hostgroup"
>> attr_nm                 (see man pages)
>> obj_id_lst              objectname [ objectname ...]
>> project_list            project[,project,...]
>> evid_list               all | evid[,evid,...]
>> host_list               all | hostname[,hostname,...]
>> obj_nm2
>> "queue"|"queue_domain"|"queue_instance"|"exechost"
>>
>> Program terminated normally
>>
>> (dbx) run -su cesga
>> thread 0xb signal Segmentation fault at >*[strlen, 0x3ff800d2590]
>> ldq_u   t0, 0(a0)
>> (dbx) run -mu cesga
>> thread 0x8 signal Segmentation fault at >*[strlen, 0x3ff800d2590]
>> ldq_u   t0, 0(a0)
>> --------------------------------------------------------------------------
>>
>> Thanks,
>> Esteban
>>
>>
>>> Hi Esteban,
>>>
>>> the attachment went not through and truss output unfortunately
>>> does not help either in this case.
>>>
>>> Can't you run qconf under control of dbx debugger? That would
>>> tell you directly in which C function the qconf crashes.
>>>
>>> Regards,
>>> Andreas
>>>
>>>
>>> On Fri, 28 Jul 2006, Esteban Freire Garcia wrote:
>>>
>>>>
>>>> Hi, I execute the commando 'truss -o qconf_strace -f qconf -su
>>>> cesga', and this is the exit for the 'stacktrace' before show
>>>> "Incurred fault.."
>>>> -------------------------------------------------------------------------->>>> 659636: select(5, 0x000000011FFFAAD8, 0x00000000, 0x00000000,
>>>> 0x000000011FFFAAB0) = 1
>>>> 659636: read(4, " h", 1)                                = 1
>>>> 659636: gettimeofday(0x000000011FFFAAB8, 0x00000000)    = 0
>>>> 659636: select(5, 0x000000011FFFAAD8, 0x00000000, 0x00000000,
>>>> 0x000000011FFFAAB0) = 1
>>>> 659636: read(4, " >", 1)                                = 1
>>>> 659636: gettimeofday(0x000000011FFFAAC8, 0x00000000)    = 0
>>>> 659636: select(5, 0x000000011FFFAAE8, 0x00000000, 0x00000000,
>>>> 0x000000011FFFAAC0) = 1
>>>> 659636: read(4, " < m i h   v e r s i o n".., 97)       = 97
>>>> 659636: gettimeofday(0x000000011FFFAAC8, 0x00000000)    = 0
>>>> 659636: select(5, 0x000000011FFFAAE8, 0x00000000, 0x00000000,
>>>> 0x000000011FFFAAC0) = 1
>>>> 659636: read(4, " < a m   v e r s i o n =".., 35)       = 35
>>>> 659636: gettimeofday(0x00000001400C2A48, 0x00000000)    = 0
>>>> 659636: gettimeofday(0x000000011FFFAD20, 0x00000000)    = 0
>>>> 659636: gettimeofday(0x000000011FFFAEC8, 0x00000000)    = 0
>>>> 659636: gettimeofday(0x000000011FFFAED0, 0x00000000)    = 0
>>>> 659636: gettimeofday(0x000000011FFFAEC8, 0x00000000)    = 0
>>>> 659636: gettimeofday(0x000000011FFFAF38, 0x00000000)    = 0
>>>> 659636: select(5, 0x000000011FFFACD8, 0x000000011FFFAAD8,
>>>> 0x00000000, 0x000000011FFFAA90) = 1
>>>> 659636: gettimeofday(0x000000011FFFAEE0, 0x00000000)    = 0
>>>> 659636: gettimeofday(0x000000011FFFAEC0, 0x00000000)    = 0
>>>> 659636: gettimeofday(0x000000011FFFABC8, 0x00000000)    = 0
>>>> 659636: select(5, 0x000000011FFFABE8, 0x00000000, 0x00000000,
>>>> 0x000000011FFFABC0) = 1
>>>> 659636: read(4, " < g m s h > < d l > 9 9".., 22)       = 22
>>>> 659636: gettimeofday(0x000000011FFFABC8, 0x00000000)    = 0
>>>> 659636: select(5, 0x000000011FFFABE8, 0x00000000, 0x00000000,
>>>> 0x000000011FFFABC0) = 1
>>>> 659636: read(4, " h", 1)                                = 1
>>>> 659636: gettimeofday(0x000000011FFFABC8, 0x00000000)    = 0
>>>> 659636: select(5, 0x000000011FFFABE8, 0x00000000, 0x00000000,
>>>> 0x000000011FFFABC0) = 1
>>>> 659636: read(4, " >", 1)                                = 1
>>>> 659636: gettimeofday(0x000000011FFFABD8, 0x00000000)    = 0
>>>> 659636: select(5, 0x000000011FFFABF8, 0x00000000, 0x00000000,
>>>> 0x000000011FFFABD0) = 1
>>>> 659636: read(4, " < m i h   v e r s i o n".., 99)       = 99
>>>> 659636: gettimeofday(0x000000011FFFABD8, 0x00000000)    = 0
>>>> 659636: select(5, 0x000000011FFFABF8, 0x00000000, 0x00000000,
>>>> 0x000000011FFFABD0) = 1
>>>> 659636: read(4, "\0\0\0\01002\0\0\0\0\001".., 373)      = 373
>>>> 659636: gettimeofday(0x00000001400C29C8, 0x00000000)    = 0
>>>> 659636: gettimeofday(0x000000011FFFAE30, 0x00000000)    = 0
>>>> 659636: gettimeofday(0x000000011FFFAEC0, 0x00000000)    = 0
>>>> 659636:     Incurred fault #32, FLTBOUNDS  %pc = 0x000003FF800D2590
>>>> addr = 0x000000011FFF9420
>>>> 659636:     Received signal #11, SIGSEGV [caught]
>>>> 659636:       siginfo: SIGSEGV SEGV_MAPERR addr=0x0000000020746365
>>>> 659636: sigaltstack(0x00000000, 0x000000011FFF8860)     = 0
>>>> 659636: sigprocmask(SIG_BLOCK, 0x00000000, 0x00000000)  = -108655535
>>>> 659636: sigstack(0x00000000, 0x000000011FFF87D8)        = 0
>>>> 659636: sigprocmask(SIG_UNBLOCK, 0x00000400, 0x00000000) =
>>>> -108655535 659636: sigaction(SIGSEGV, 0x000000011FFF8648,
>>>> 0x00000000) = 0 659636:     Received signal #11, SIGSEGV [default]
>>>> 659636:       siginfo: SIGSEGV
>>>>                                                Err#139 Error 139
>>>>                                                occurred.
>>>> 659636:         *** process killed ***
>>>>
>>>> -------------------------------------------------------------------------->>>> I send to you the complete exit for the 'stacktrace' as attached
>>>> file. However, the date it seems correct.
>>>> ----------------------------------
>>>> sc1/esfreire> date
>>>> Fri Jul 28 09:47:57 CEST 2006
>>>> ----------------------------------
>>>>
>>>> Thanks to answer me!!
>>>>
>>>>
>>>>> Hi Esteban,
>>>>>
>>>>> could you provide a stacktrace for the seg fault?
>>>>> That would help to understand the problem.
>>>>>
>>>>> Regards,
>>>>> Andreas
>>>>>
>>>>> On Thu, 27 Jul 2006, Esteban Freire Garcia wrote:
>>>>>
>>>>>>
>>>>>> Thanks to answer me. I am use the version SGE 6.0
>>>>>> ----------------------------------------------
>>>>>> sc1/esfreire> qconf -help | grep 6.0
>>>>>> SGE 6.0
>>>>>> ---------------------------------------------
>>>>>> The message 'segmentation fault' only is shown when I make a
>>>>>> 'qconf -mu esfreire'  or 'qconf -su esfreire', for the rest of
>>>>>> commands that I use for to administrate the SGE does not evidence
>>>>>> this message.
>>>>>>
>>>>>> It never had shown this message, began to show it does a month,
>>>>>> and ever since always it show the message when I make qconf -su |
>>>>>> -mu ,however with qmon I can see and edit the list . I believe
>>>>>> that it can should to any list that I create badly or that keep
>>>>>> with incorrect data.
>>>>>>
>>>>>> Thanks,
>>>>>> Esteban
>>>>>>
>>>>>>> Reuti wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Am 27.07.2006 um 09:14 schrieb Esteban Freire Garcia:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi everybody,
>>>>>>>>>
>>>>>>>>> I have a machine Compaq HPC320,  with UNIX installed ( Tru64
>>>>>>>>> V5.1A )
>>>>>>>>>  and SGE
>>>>>>>>> ( sge 6.0 - tru64 ). Now, when I do one:
>>>>>>>>> -------------------------------------------------------------------->>>>>>>>> sc1/esfreire qconf -su esfreire
>>>>>>>>> Segmentation fault
>>>>>>>>>
>>>>>>>>> sc1/root> qconf -su esfreire
>>>>>>>>> Memory fault
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------->>>>>>>>> Somebody of you would be able to tell me because show the
>>>>>>>>> message "Segmentation fault" and which is the solution.
>>>>>>>>> However, with qmon I
>>>>>>>>>  can see
>>>>>>>>> the list and to edit it.
>>>>>>>>
>>>>>>>>
>>>>>>>> this error message usually indicates a programming error, as the
>>>>>>>> software tries to access an illegal address. So it shouldn't
>>>>>>>> happen at  all. You get this error only with the "-su", or also
>>>>>>>> with other options? Was it working before at any time, and just
>>>>>>>> refuses now to operate?
>>>>>>>
>>>>>>> I've tried to reproduce the sgefault in our lab but for me it
>>>>>>> works fine. We have True64 V5.0.
>>>>>>>
>>>>>>> Can you please tell us what update version do you use? You can
>>>>>>> figure this out with 'qconf -help | grep 6.0'.
>>>>>>>
>>>>>>> Roland
>>>>>>>
>>>>>>>>
>>>>>>>> -- Reuti
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------->>>>>>>> To unsubscribe, e-mail:
>>>>>>>> users-unsubscribe at gridengine.sunsource.net For additional
>>>>>>>> commands, e-mail:
>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>>>>>> Roland Dittel               Tel: +49 (0)941 3075-275 (x60275)
>>>>>>> Software Engineering        Fax: +49 (0)941 3075-222 (x60222) Sun
>>>>>>> Microsystems GmbH
>>>>>>> Dr.-Leo-Ritter-Str. 7       mailto:roland.dittel at sun.com
>>>>>>> D-93049 Regensburg          http://www.sun.com/gridware
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:
>>>>>>> users-unsubscribe at gridengine.sunsource.net For additional
>>>>>>> commands, e-mail:
>>>>>>> users-help at gridengine.sunsource.net
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail:
>>>>>> users-help at gridengine.sunsource.net
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail:
>>>>> users-help at gridengine.sunsource.net
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net For
>> additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net For
> additional commands, e-mail: users-help at gridengine.sunsource.net



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list