[GE users] schedd hangs with infinite loop :-((

Christian Kauhaus ckauhaus at informatik.uni-jena.de
Thu Apr 1 11:03:07 BST 2004


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hello!

I've got a serious problem with our SGEEE 5.3p5 installation since
yesterday: for no apparent reason the scheduler just goes into an
infinite loop and does no more scheduling. 

I would really appreciate quick help on this, since our cluster isn't
working anymore.

The scheduler runs on Linux x86, Kernel 2.4.22 with glibc 2.3.2 on 
Debian testing. We've just a moderate number of jobs waiting in the
queue. To circle the problem. I've set loglevel to 2 with dl.sh. 

This is the backtrace:

# gdb /usr/local/sge53/bin/glinux/sge_schedd core.16309 
GNU gdb 5.3-debian
This GDB was configured as "i386-linux"...
Core was generated by `sge_schedd'.
Program terminated with signal 3, Quit.
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
#0  0x400b73f6 in mallopt () from /lib/libc.so.6
(gdb) bt
#0  0x400b73f6 in mallopt () from /lib/libc.so.6
#1  0x400b727e in mallopt () from /lib/libc.so.6
#2  0x400b5faf in free () from /lib/libc.so.6
#3  0x080a3520 in lFreeElem ()
#4  0x080a3a11 in lRemoveElem ()
#5  0x080a356d in lFreeList ()
#6  0x0807bab9 in free_fcategories ()
#7  0x0807f135 in sge_calc_tickets ()
#8  0x08080f59 in sge_scheduler ()
#9  0x0804e7a3 in dispatch_jobs ()
#10 0x0804df06 in scheduler ()
#11 0x0805168a in event_handler_default_scheduler ()
#12 0x0804a86a in main ()
#13 0x4005adc6 in __libc_start_main () from /lib/libc.so.6
(gdb) 

I've attached the schedd debug log.

Regards
  Christian

-- 
Dipl.-Inf. Christian Kauhaus                               <><
Lehrstuhl Rechnerarchitektur und -kommunikation 
Institut fuer Informatik · Ernst-Abbe-Platz 1-2 · D-07743 Jena
Tel.: (+49) 3641 9 46376 · Fax: (+49) 3641 9 46372 · Raum 3217


    [ Part 2, Text/PLAIN 13,853 lines. ]
    [ Unable to print this part. ]


    [ Part 3: "Attached Text" ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list