RE : [GE users] RE : RE : [GE users] qmaster blow-up

GARDAIS Ionel Ionel.Gardais at tech-advantage.com
Fri May 11 21:54:25 BST 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Also, I confirm that despite qmaster's crash, modification of resource quota is validated and used when submitting new jobs (after restarting qmaster, obviously).

Ionel


-------- Message d'origine--------
De: ionel.gardais at tech-advantage.com [mailto:ionel.gardais at tech-advantage.com]
Date: ven. 11/05/2007 21:55
?: users at gridengine.sunsource.net
Objet : [GE users] RE : RE : [GE users] qmaster blow-up
 
I made it crash.
Here is the backtrace right after the crash.

(gdb) bt
#0  0x0000002a95a5d445 in raise () from /lib64/tls/libc.so.6
#1  0x0000002a95a5ebb3 in abort () from /lib64/tls/libc.so.6
#2  0x00000000004f8059 in lGetPosViaElem ()
#3  0x00000000004f88a7 in lGetString ()
#4  0x00000000004fe20c in lDiffListStr ()
#5  0x0000000000470ccb in rqs_diff_projects ()
#6  0x0000000000470db4 in rqs_update_categories ()
#7  0x000000000047119d in rqs_success ()
#8  0x000000000042d592 in sge_gdi_add_mod_generic ()
#9  0x000000000042bf4e in sge_c_gdi_mod ()
#10 0x0000000000428fe4 in sge_c_gdi ()
#11 0x000000000046bd69 in do_gdi_request ()
#12 0x000000000046ba2c in sge_qmaster_process_message ()
#13 0x000000000042873e in message_thread ()
#14 0x0000002a9591fc64 in start_thread () from /lib64/tls/libpthread.so.0
#15 0x0000002a95b05ec3 in thread_start () from /lib64/tls/libc.so.6
#16 0x0000000000000000 in ?? ()
(gdb)

If you need something more, just tell.
And feel free to add specific commands you want me to run on gdb : I'm not that confident with it :)

Ionel


-----Message d'origine-----
De : Andreas.Haas at Sun.COM [mailto:Andreas.Haas at Sun.COM] 
Envoyé : vendredi 11 mai 2007 21:08
? : users at gridengine.sunsource.net
Objet : Re: RE : [GE users] qmaster blow-up

Hi Ionel,

well, if it is reproducable, that is good news! ;-)

Please try to run qmaster under gdb(1). You can do this 
by running

    # setenv SGE_ND
    # gdb $SGE_ROOT/bin/lx24-amd64/sge_qmaster

note, the SGE_ND (=no deamonize) must be set to prevent qmaster from 
daemonizing. Since the crash occurs during configuration change, 
you will most probably not need to start sge_schedd.

Regards,
Andreas


On Fri, 11 May 2007, GARDAIS Ionel wrote:

> Andy,
>
> Sure, netbeans could be a good choice as an IDE.
> However this is not the same 'eclipse' as the one you think about. (actually, that's the Eclipse Simulator from Schlumberger, a petroleum software)
>
> I am able to reproduce the crash.
> To do so, I just add a second 'limit' line to the first one. (It works flawlessly with an unique limit line)
>
> Just making 'qconf -mrqs eclipse' and quitting *without* saving cause qmaster to crash.
> Note that when I change something in the file and save, the qmaster crashes *but* the modification is accepted. ('qconf -srqs' shows the modified version of the resource quota)
>
> Qmaster is running RedHat 3 (Update 4 or 5); kernel 2.4.
> Hardware is a dual-opteron 2.2GHz with 4Gb RAM.
> Administrative host from which I initiate the modification is running RedHat 4 update 4; kernel 2.6.
> I've cheated the SGE bin/lib/utilbin directories with a symlink from lx24-amd64/ to lx26-amd64/ (so 2.6 can use SGE without compiling SGE from sources)
>
> If you need more infos, just ask.
>
> Ionel
>
>
> -------- Message d'origine--------
> De: Andy Schwierskott [mailto:andy.schwierskott at sun.com]
> Date: ven. 11/05/2007 16:43
> ?: users at gridengine.sunsource.net
> Objet : Re: [GE users] qmaster blow-up
>
> Ionel,
>
>    % qconf -srqs | sed 's/eclipse/netbeans/' > /tmp/sgerqs.txt; qconf -Mrqs /tmp/sgerqs.txt
>
> Just a silly joke from a Sun employeee of course .... °-)
>
> Seriously: we can't reproduce your issue so, so I'm afraid we'll need more
> information from you. Are you able to reproduce the crash? What OS is your
> qmasterhost running?
>
> Andy
>
>> it happens when I edit (not even save) this resource quota :
>>
>>
>> $ qconf -srqs eclipse
>> {
>>   name         eclipse
>>   description  Limit eclipse job to 8 licenses
>>   enabled      TRUE
>>   limit        projects eclipse hosts @regatta to slots=3
>>   limit        projects eclipse to slots=5
>> }
>>
>> I also tried :
>>   limit        projects {eclipse} hosts {@regatta} to slots=3
>>   limit        projects {eclipse} hosts {*} to slots=5
>>
>> But it quit with the same error message.
>>
>> Ionel
>>
>> Ionel GARDAIS wrote:
>>> Hi,
>>>
>>> Our qmaster has just blown-up.
>>> The last log in the messages file is :
>>>
>>> 05/11/2007 14:16:19|qmaster|orion|C|!!!!!!!!!! UP_name not found in
>>> element !!!!!!!!!!
>>>
>>> Unfortunatly, I have no more informations to provide.
>>> What can be the problem ?
>>>
>>> Thanks,
>>> Ionel
>
>
>

http://gridengine.info/

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list