[GE users] Segmentation fault to do qconf -su group

Andreas.Haas at Sun.COM Andreas.Haas at Sun.COM
Mon Aug 14 13:06:39 BST 2006


Hi Esteban,

Good!

Yet the output still tells me merely qconf crashed in strlen()

    Segmentation fault at >*[strlen, 0x3ff800d2590]

what I need for actually locating the problem is a full stack 
trace.

You get a stack trace from dbx by entering the 'where' command 
after the Seg fault happend.

Thanks,
Andreas

On Mon, 14 Aug 2006, Esteban Freire Garcia wrote:

>
> Hi,
>
> Sorry, I couldn't reply because I was on holidays, now I run qcon -su | -mu
> under dbx and this is the result:
> --------------------------------------------------------------------------
> sc>dbx qconf.out
> dbx version 5.1
> Type 'help' for help.
>
> (dbx) run
> SGE 6.0
> usage: qconf [options]
>   [-aattr obj_nm attr_nm val obj_id_lst]   add to a list attribute of an
>   object   [-Aattr obj_nm fname obj_id_lst]         add to a list attribute of an
>   object   [-acal calendar_name]                    add a new calendar
>   [-Acal fname]                            add a new calendar from file
>   [-ackpt ckpt_name]                       add a ckpt interface definition
>   [-Ackpt fname]                           add a ckpt interface definition
> from file
>   [-aconf host_list]                       add configurations
>   [-Aconf file_list]                       add configurations from file_list
>   [-ae [exec_server_template]]             add an exec host using a template
>   [-Ae fname]                              add an exec host from file
>   [-ah hostname]                           add an administrative host
>   [-ahgrp group]                           add new host group entry
>   [-Ahgrp file]                            add new host group entry from
>   file   [-am user_list]                          add user to manager list
>   [-ao user_list]                          add user to operator list
>   [-ap pe-name]                            add a new parallel environment
>   [-Ap fname]                              add a new parallel environment
> from file
>   [-aprj]                                  add project
>   [-Aprj fname]                            add project from file
>   [-aq ]                                   add a new cluster queue
>   [-Aq fname]                              add a queue from file
>   [-as hostname]                           add a submit host
>   [-astnode node_shares_list]              add sharetree node(s)
>   [-astree]                                create/modify the sharetree
>   [-Astree fname]                          create/modify the sharetree from
> file
>   [-au user_list listname_list]            add user(s) to userset list(s)
>   [-Au fname]                              add userset from file
>   [-auser]                                 add user
>   [-Auser fname]                           add user from file
>   [-clearusage]                            clear all user/project sharetree
> usage
>   [-cq destin_id_list]                     clean queue
>   [-dattr obj_nm attr_nm val obj_id_lst]   delete from a list attribute of
> an object
>   [-Dattr obj_nm fname obj_id_lst]         delete from a list attribute of
> an object
>   [-dcal calendar_name]                    remove a calendar
>   [-dckpt ckpt_name]                       remove a ckpt interface
>   definition   [-dconf host_list]                       delete local configurations
>   [-de host_list]                          remove an exec server
>   [-dh host_list]                          remove an administrative host
>   [-dhgrp group]                           delete host group entry
>   [-dm user_list]                          remove user from manager list
>   [-do user_list]                          remove user from operator list
>   [-dp pe-name]                            remove a parallel environment
>   [-dprj project_list]                     delete project
>   [-dq destin_id_list]                     remove a queue
>   [-ds host_list]                          remove submit host
>   [-dstnode node_list]                     remove sharetree node(s)
>   [-dstree]                                delete the sharetree
>   [-du user_list listname_list]            remove user(s) from userset
>   list(s)   [-dul listname_list]                     remove userset list(s) completely
>   [-duser user_list]                       delete user
>   [-help]                                  print this help
>   [-ke[j] host_list                        shutdown execution daemon(s)
>   [-k{m|s}]                                shutdown master|scheduling daemon
>   [-kec evid_list]                         kill event client
>   [-mattr obj_nm attr_nm val obj_id_lst]   modify an attribute (or element
> in a sublist) of an object
>   [-Mattr obj_nm fname obj_id_lst]         modify an attribute (or element
> in a sublist) of an object
>   [-mc ]                                   modify complex attributes
>   [-mckpt ckpt_name]                       modify a ckpt interface
>   definition   [-Mc fname]                              modify complex attributes from
>   file   [-mcal calendar_name]                    modify calendar
>   [-Mcal fname]                            modify calendar from file
>   [-Mckpt fname]                           modify a ckpt interface
> definition from file
>   [-mconf [host_list|global]]              modify configurations
>   [-msconf]                                modify scheduler configuration
>   [-Msconf fname]                          modify scheduler configuration
> from file
>   [-me server]                             modify exec server
>   [-Me fname]                              modify exec server from file
>   [-mhgrp group]                           modify host group entry
>   [-Mhgrp file]                            modify host group entry from file
>   [-mp pe-name]                            modify a parallel environment
>   [-Mp fname]                              modify a parallel environment
> from file
>   [-mprj project]                          modify a project
>   [-Mprj fname]                            modify project from file
>   [-mq queue]                              modify a queue
>   [-Mq fname]                              modify a queue from file
>   [-mstnode node_shares_list]              modify sharetree node(s)
>   [-Mstree fname]                          modify/create the sharetree from
> file
>   [-mstree]                                modify/create the sharetree
>   [-mu listname_list]                      modify the given userset list
>   [-Mu fname]                              modify userset from file
>   [-muser user]                            modify a user
>   [-Muser fname]                           modify a user from file
>   [-rattr obj_nm attr_nm val obj_id_lst]   replace a list attribute of an
> object
>   [-Rattr obj_nm fname obj_id_lst]         replace a list attribute of an
> object
>   [-sc ]                                   show complex attributes
>   [-scal calendar_name]                    show given calendar
>   [-scall]                                 show a list of all calendar names
>   [-sckpt ckpt_name]                       show ckpt interface definition
>   [-sckptl]                                show all ckpt interface
>   definitions   [-sconf [host_list|global]]              show configurations
>   [-sconfl]                                show a list of all local
> configurations
>   [-se server]                             show given exec server
>   [-secl]                                  show event client list
>   [-sel]                                   show a list of all exec servers
>   [-sep]                                   show a list of all licensed
> processors
>   [-sh]                                    show a list of all
> administrative hosts
>   [-shgrp group]                           show host group
>   [-shgrp_tree group]                      show host group and used
> hostgroups as tree
>   [-shgrp_resolved group]                  show host group with resolved
> hostlist
>   [-shgrpl]                                show host group list
>   [-sds]                                   show detached settings
>   [-sm]                                    show a list of all managers
>   [-so]                                    show a list of all operators
>   [-sobjl obj_nm2 attr_nm val]             show objects which match the
> given value
>   [-sp pe-name]                            show a parallel environment
>   [-spl]                                   show all parallel environments
>   [-sprj project]                          show a project
>   [-sprjl]                                 show a list of all projects
>   [-sq [destin_id_list]]                   show the given queue
>   [-sql]                                   show a list of all queues
>   [-ss]                                    show a list of all submit hosts
>   [-sss]                                   show scheduler state
>   [-ssconf]                                show scheduler configuration
>   [-sstnode node_list]                     show sharetree node(s)
>   [-rsstnode node_list]                    show sharetree node(s) and its
> children
>   [-sstree]                                show the sharetree
>   [-su listname_list]                      show the given userset list
>   [-suser user_list]                       show user(s)
>   [-sul]                                   show a list of all userset lists
>   [-suserl]                                show a list of all users
>   [-tsm]                                   trigger scheduler monitoring
> complex_list            complex[,complex,...]
> destin_id_list          queue[ queue ...]
> listname_list           listname[,listname,...]
> node_list               node_path[,node_path,...]
> node_path               [/]node_name[[/.]node_name...]
> node_shares_list        node_path=shares[,node_path=shares,...]
> user_list               user|pattern[,user|pattern,...]
> obj_nm                  "queue"|"exechost"|"pe"|"ckpt"|"hostgroup"
> attr_nm                 (see man pages)
> obj_id_lst              objectname [ objectname ...]
> project_list            project[,project,...]
> evid_list               all | evid[,evid,...]
> host_list               all | hostname[,hostname,...]
> obj_nm2                 "queue"|"queue_domain"|"queue_instance"|"exechost"
>
> Program terminated normally
>
> (dbx) run -su cesga
> thread 0xb signal Segmentation fault at >*[strlen, 0x3ff800d2590]
> ldq_u   t0, 0(a0)
> (dbx) run -mu cesga
> thread 0x8 signal Segmentation fault at >*[strlen, 0x3ff800d2590]
> ldq_u   t0, 0(a0)
> --------------------------------------------------------------------------
>
> Thanks,
> Esteban
>
>
>> Hi Esteban,
>>
>> the attachment went not through and truss output unfortunately
>> does not help either in this case.
>>
>> Can't you run qconf under control of dbx debugger? That would
>> tell you directly in which C function the qconf crashes.
>>
>> Regards,
>> Andreas
>>
>>
>> On Fri, 28 Jul 2006, Esteban Freire Garcia wrote:
>>
>>>
>>> Hi, I execute the commando 'truss -o qconf_strace -f qconf -su cesga',
>>> and this is the exit for the 'stacktrace' before show "Incurred
>>> fault.."
>>> --------------------------------------------------------------------------
>>> 659636: select(5, 0x000000011FFFAAD8, 0x00000000, 0x00000000,
>>> 0x000000011FFFAAB0) = 1
>>> 659636: read(4, " h", 1)                                = 1
>>> 659636: gettimeofday(0x000000011FFFAAB8, 0x00000000)    = 0
>>> 659636: select(5, 0x000000011FFFAAD8, 0x00000000, 0x00000000,
>>> 0x000000011FFFAAB0) = 1
>>> 659636: read(4, " >", 1)                                = 1
>>> 659636: gettimeofday(0x000000011FFFAAC8, 0x00000000)    = 0
>>> 659636: select(5, 0x000000011FFFAAE8, 0x00000000, 0x00000000,
>>> 0x000000011FFFAAC0) = 1
>>> 659636: read(4, " < m i h   v e r s i o n".., 97)       = 97
>>> 659636: gettimeofday(0x000000011FFFAAC8, 0x00000000)    = 0
>>> 659636: select(5, 0x000000011FFFAAE8, 0x00000000, 0x00000000,
>>> 0x000000011FFFAAC0) = 1
>>> 659636: read(4, " < a m   v e r s i o n =".., 35)       = 35
>>> 659636: gettimeofday(0x00000001400C2A48, 0x00000000)    = 0
>>> 659636: gettimeofday(0x000000011FFFAD20, 0x00000000)    = 0
>>> 659636: gettimeofday(0x000000011FFFAEC8, 0x00000000)    = 0
>>> 659636: gettimeofday(0x000000011FFFAED0, 0x00000000)    = 0
>>> 659636: gettimeofday(0x000000011FFFAEC8, 0x00000000)    = 0
>>> 659636: gettimeofday(0x000000011FFFAF38, 0x00000000)    = 0
>>> 659636: select(5, 0x000000011FFFACD8, 0x000000011FFFAAD8, 0x00000000,
>>> 0x000000011FFFAA90) = 1
>>> 659636: gettimeofday(0x000000011FFFAEE0, 0x00000000)    = 0
>>> 659636: gettimeofday(0x000000011FFFAEC0, 0x00000000)    = 0
>>> 659636: gettimeofday(0x000000011FFFABC8, 0x00000000)    = 0
>>> 659636: select(5, 0x000000011FFFABE8, 0x00000000, 0x00000000,
>>> 0x000000011FFFABC0) = 1
>>> 659636: read(4, " < g m s h > < d l > 9 9".., 22)       = 22
>>> 659636: gettimeofday(0x000000011FFFABC8, 0x00000000)    = 0
>>> 659636: select(5, 0x000000011FFFABE8, 0x00000000, 0x00000000,
>>> 0x000000011FFFABC0) = 1
>>> 659636: read(4, " h", 1)                                = 1
>>> 659636: gettimeofday(0x000000011FFFABC8, 0x00000000)    = 0
>>> 659636: select(5, 0x000000011FFFABE8, 0x00000000, 0x00000000,
>>> 0x000000011FFFABC0) = 1
>>> 659636: read(4, " >", 1)                                = 1
>>> 659636: gettimeofday(0x000000011FFFABD8, 0x00000000)    = 0
>>> 659636: select(5, 0x000000011FFFABF8, 0x00000000, 0x00000000,
>>> 0x000000011FFFABD0) = 1
>>> 659636: read(4, " < m i h   v e r s i o n".., 99)       = 99
>>> 659636: gettimeofday(0x000000011FFFABD8, 0x00000000)    = 0
>>> 659636: select(5, 0x000000011FFFABF8, 0x00000000, 0x00000000,
>>> 0x000000011FFFABD0) = 1
>>> 659636: read(4, "\0\0\0\01002\0\0\0\0\001".., 373)      = 373
>>> 659636: gettimeofday(0x00000001400C29C8, 0x00000000)    = 0
>>> 659636: gettimeofday(0x000000011FFFAE30, 0x00000000)    = 0
>>> 659636: gettimeofday(0x000000011FFFAEC0, 0x00000000)    = 0
>>> 659636:     Incurred fault #32, FLTBOUNDS  %pc = 0x000003FF800D2590
>>> addr = 0x000000011FFF9420
>>> 659636:     Received signal #11, SIGSEGV [caught]
>>> 659636:       siginfo: SIGSEGV SEGV_MAPERR addr=0x0000000020746365
>>> 659636: sigaltstack(0x00000000, 0x000000011FFF8860)     = 0
>>> 659636: sigprocmask(SIG_BLOCK, 0x00000000, 0x00000000)  = -108655535
>>> 659636: sigstack(0x00000000, 0x000000011FFF87D8)        = 0
>>> 659636: sigprocmask(SIG_UNBLOCK, 0x00000400, 0x00000000) = -108655535
>>> 659636: sigaction(SIGSEGV, 0x000000011FFF8648, 0x00000000) = 0
>>> 659636:     Received signal #11, SIGSEGV [default]
>>> 659636:       siginfo: SIGSEGV
>>>                                                Err#139 Error 139
>>>                                                occurred.
>>> 659636:         *** process killed ***
>>>
>>> --------------------------------------------------------------------------
>>> I send to you the complete exit for the 'stacktrace' as attached file.
>>> However, the date it seems correct.
>>> ----------------------------------
>>> sc1/esfreire> date
>>> Fri Jul 28 09:47:57 CEST 2006
>>> ----------------------------------
>>>
>>> Thanks to answer me!!
>>>
>>>
>>>> Hi Esteban,
>>>>
>>>> could you provide a stacktrace for the seg fault?
>>>> That would help to understand the problem.
>>>>
>>>> Regards,
>>>> Andreas
>>>>
>>>> On Thu, 27 Jul 2006, Esteban Freire Garcia wrote:
>>>>
>>>>>
>>>>> Thanks to answer me. I am use the version SGE 6.0
>>>>> ----------------------------------------------
>>>>> sc1/esfreire> qconf -help | grep 6.0
>>>>> SGE 6.0
>>>>> ---------------------------------------------
>>>>> The message 'segmentation fault' only is shown when I make a 'qconf
>>>>> -mu esfreire'  or 'qconf -su esfreire', for the rest of commands
>>>>> that I use for to administrate the SGE does not evidence this
>>>>> message.
>>>>>
>>>>> It never had shown this message, began to show it does a month, and
>>>>> ever since always it show the message when I make qconf -su | -mu
>>>>> ,however with qmon I can see and edit the list . I believe that it
>>>>> can should to any list that I create badly or that keep with
>>>>> incorrect data.
>>>>>
>>>>> Thanks,
>>>>> Esteban
>>>>>
>>>>>> Reuti wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Am 27.07.2006 um 09:14 schrieb Esteban Freire Garcia:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi everybody,
>>>>>>>>
>>>>>>>> I have a machine Compaq HPC320,  with UNIX installed ( Tru64
>>>>>>>> V5.1A )
>>>>>>>>  and SGE
>>>>>>>> ( sge 6.0 - tru64 ). Now, when I do one:
>>>>>>>> --------------------------------------------------------------------
>>>>>>>> sc1/esfreire qconf -su esfreire
>>>>>>>> Segmentation fault
>>>>>>>>
>>>>>>>> sc1/root> qconf -su esfreire
>>>>>>>> Memory fault
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> Somebody of you would be able to tell me because show the message
>>>>>>>> "Segmentation fault" and which is the solution. However, with
>>>>>>>> qmon I
>>>>>>>>  can see
>>>>>>>> the list and to edit it.
>>>>>>>
>>>>>>>
>>>>>>> this error message usually indicates a programming error, as the
>>>>>>> software tries to access an illegal address. So it shouldn't
>>>>>>> happen at  all. You get this error only with the "-su", or also
>>>>>>> with other options? Was it working before at any time, and just
>>>>>>> refuses now to operate?
>>>>>>
>>>>>> I've tried to reproduce the sgefault in our lab but for me it works
>>>>>> fine. We have True64 V5.0.
>>>>>>
>>>>>> Can you please tell us what update version do you use? You can
>>>>>> figure this out with 'qconf -help | grep 6.0'.
>>>>>>
>>>>>> Roland
>>>>>>
>>>>>>>
>>>>>>> -- Reuti
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>> For additional commands, e-mail:
>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>>>>> Roland Dittel               Tel: +49 (0)941 3075-275 (x60275)
>>>>>> Software Engineering        Fax: +49 (0)941 3075-222 (x60222) Sun
>>>>>> Microsystems GmbH
>>>>>> Dr.-Leo-Ritter-Str. 7       mailto:roland.dittel at sun.com
>>>>>> D-93049 Regensburg          http://www.sun.com/gridware
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail:
>>>>>> users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net For
>>> additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net For
>> additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list