[GE users] sge_shepherd in limbo (sge62u21)

onida lba777 at gmail.com
Mon Jun 1 20:00:03 BST 2009


Hi,

I recently updated from sge62 to sge62u21. Jobs spawned through qrsh that worked fine earlier are now experiencing trouble. Specifically, the slave node finishes the job but somehow sge_shepherd is left waiting endlessly. I tried this a few times and each time one of the sge_shepherd on one of the nodes seem to suffer. 

I would appreciate any pointers or hints on tackling this.

onida

Here are some relevant information with ps, strace and gdb.


onida at node8:~$ ps -ef | grep sge
sgeadmin 28702     1  0 May31 ?        00:05:07 /l2/sge62u21/bin/lx24-amd64/sge_execd
sgeadmin 28792 28702  0 May31 ?        00:00:00 sge_shepherd-236 -bg

onida at node8:~$ sudo strace -p 28792
Process 28792 attached - interrupt to quit
futex(0x7fffae210a40, FUTEX_WAIT, 1, NULL <unfinished ...>

onida at node8:~$ sudo gdb /l2/sge62u21/bin/lx24-amd64/sge_shepherd 28792GNU gdb 6.4.90-debian
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/libthread_db.so.1".

Attaching to program: /l2/sge62u21/bin/lx24-amd64/sge_shepherd, process 28792
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libpthread.so.0...
(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 47863064713984 (LWP 28792)]
[New Thread 1124096352 (LWP 28810)]
[New Thread 1107310944 (LWP 28800)]
[New Thread 1098918240 (LWP 28796)]
[New Thread 1090525536 (LWP 28795)]
[New Thread 1082132832 (LWP 28794)]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux-x86-64.so.2...
(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnsl.so.1...
(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2...
(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_dns.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2...
(no debugging symbols found)...done.
Loaded symbols for /lib/libresolv.so.2
0x00002b87fcc387aa in __nptl_setxid ()
   from /lib/libpthread.so.0

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=200191

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list