[GE users] no hosts in Inspect cluster

andre Andre.Alefeld at sun.com
Wed Nov 11 09:07:38 GMT 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi Fred,

we run into a similar problem on a linux-amd64 machine and maybe this is 
somehow related. The behavior is similar that the log files are not 
showing up and that you cannot connect to the JVM thread. It is possible 
to connect with the debugger to the master that gives you a hint where 
something goes wrong. It is actually during the create_vm call that 
libjvm is loaded but then gets stuck with a read error in libc6.so. As 
stated this occurs only on a Ubuntu linux machine. You could try to 
connect to your master as well and see if this is the same problem (as 
root connect to the running master with gdb like below then we can maybe 
find some more hints):

# gdb -pid=8432 $SGE_ROOT/bin/lx26-amd64/sge_qmaster
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...
Attaching to program: /ts/jaapi_ts/bin/lx26-amd64/sge_qmaster, process 8432
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 0x7fd96fd306e0 (LWP 8432)]
[New Thread 0x46ff4950 (LWP 8445)]
[New Thread 0x467f3950 (LWP 8444)]
[New Thread 0x45ff2950 (LWP 8443)]
[New Thread 0x457f1950 (LWP 8442)]
[New Thread 0x44ff0950 (LWP 8441)]
[New Thread 0x447ef950 (LWP 8440)]
[New Thread 0x43fee950 (LWP 8439)]
[New Thread 0x437ed950 (LWP 8438)]
[New Thread 0x40e89950 (LWP 8433)]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from 
/cod_home/aa114085/ts/jaapi_ts/lib/lx26-amd64/libssl.so...done.
Loaded symbols for 
/cod_home/aa114085/ts/jaapi_ts/bin/lx26-amd64/../../lib/lx26-amd64/libssl.so
Reading symbols from 
/cod_home/aa114085/ts/jaapi_ts/lib/lx26-amd64/libcrypto.so.0.9.8...done.
Loaded symbols for 
/cod_home/aa114085/ts/jaapi_ts/bin/lx26-amd64/../../lib/lx26-amd64/libcrypto.so.0.9.8
Reading symbols from /lib/libnss_nis.so.2...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from 
/cod_home/aa114085/ts/jaapi_ts/lib/lx26-amd64/libspoolb.so...done.
Loaded symbols for 
/cod_home/aa114085/ts/jaapi_ts/bin/lx26-amd64/../../lib/lx26-amd64/libspoolb.so
Reading symbols from 
/cod_home/aa114085/ts/jaapi_ts/lib/lx26-amd64/libdb-4.4.so...done.
Loaded symbols for 
/cod_home/aa114085/ts/jaapi_ts/bin/lx26-amd64/../../lib/lx26-amd64/../../lib/lx26-amd64/libdb-4.4.so
Reading symbols from 
/vol2/tools/SW/jdk1.5.0/jdk1.5.0_16/lx24-amd64/jre/lib/amd64/server/libjvm.so...done.
Loaded symbols for 
/vol2/tools/SW/jdk1.5.0/lx24-amd64/jre/lib/amd64/server/libjvm.so
0x00007fd96f48ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/libpthread.so.0
(gdb) info threads
  14 Thread 0x40e89950 (LWP 8433)  0x00007fd96f48ae1d in 
pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  13 Thread 0x417e9950 (LWP 8434)  0x00007fd96f48ae1d in 
pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  12 Thread 0x41fea950 (LWP 8435)  0x00007fd96f1ecc96 in poll () from 
/lib/libc.so.6
  11 Thread 0x427eb950 (LWP 8436)  0x00007fd96f48ae1d in 
pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  10 Thread 0x42fec950 (LWP 8437)  0x00007fd96f48e4f9 in do_sigwait () 
from /lib/libpthread.so.0
  9 Thread 0x437ed950 (LWP 8438)  0x00007fd96f48ae1d in 
pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  8 Thread 0x43fee950 (LWP 8439)  0x00007fd96f48ae1d in 
pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  7 Thread 0x447ef950 (LWP 8440)  0x00007fd96f48ae1d in 
pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  6 Thread 0x44ff0950 (LWP 8441)  0x00007fd96f48ae1d in 
pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  5 Thread 0x457f1950 (LWP 8442)  0x00007fd96f48ae1d in 
pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  4 Thread 0x45ff2950 (LWP 8443)  0x00007fd96f48ae1d in 
pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  3 Thread 0x467f3950 (LWP 8444)  0x00007fd96f48ae1d in 
pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
  2 Thread 0x46ff4950 (LWP 8445)  0x00007fd96f1e7b7b in read () from 
/lib/libc.so.6
  1 Thread 0x7fd96fd306e0 (LWP 8432)  0x00007fd96f48ab99 in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
(gdb) thread 2
[Switching to thread 2 (Thread 0x46ff4950 (LWP 8445))]#0  
0x00007fd96f1e7b7b in read () from /lib/libc.so.6
(gdb) where
#0  0x00007fd96f1e7b7b in read () from /lib/libc.so.6
#1  0x00007fd96f18cf18 in _IO_file_underflow () from /lib/libc.so.6
#2  0x00007fd96f18d98e in _IO_default_uflow () from /lib/libc.so.6
#3  0x00007fd96f188ffb in getc () from /lib/libc.so.6
#4  0x00007fd96d6e7899 in find_vma () from 
/vol2/tools/SW/jdk1.5.0/lx24-amd64/jre/lib/amd64/server/libjvm.so
#5  0x00007fd96d6e3ef0 in os::Linux::capture_initial_stack () from 
/vol2/tools/SW/jdk1.5.0/lx24-amd64/jre/lib/amd64/server/libjvm.so
#6  0x00007fd96d6e66d7 in os::init_2 () from 
/vol2/tools/SW/jdk1.5.0/lx24-amd64/jre/lib/amd64/server/libjvm.so
#7  0x00007fd96d7a8618 in Threads::create_vm () from 
/vol2/tools/SW/jdk1.5.0/lx24-amd64/jre/lib/amd64/server/libjvm.so
#8  0x00007fd96d52fd87 in JNI_CreateJavaVM () from 
/vol2/tools/SW/jdk1.5.0/lx24-amd64/jre/lib/amd64/server/libjvm.so
#9  0x0000000000435855 in sge_jvm_main (arg=<value optimized out>) at 
../daemons/qmaster/sge_thread_jvm.c:581
#10 0x00007fd96f4863f7 in start_thread () from /lib/libpthread.so.0
#11 0x00007fd96f1f5b4d in clone () from /lib/libc.so.6
#12 0x0000000000000000 in ?? ()



fredrum wrote:
> Hi Andre,
> hope you had a good weekend!? :)
>
> I did,
>   
>> grep master_spool_dir $SGE_ROOT/default/common/bootstrap
>>     
>
> got,
> qmaster_spool_dir       /usr/SGE62u4/default/spool/qmaster
>
>
>   
>> ls -rt /usr/SGE62u4/default/spool/qmaster
>>     
>
> job_scripts
> qmaster.pid
> messages/
> jobseqnum
> arseqnum
> heartbeat
>
> messages is empty. So no jgdi* file there either. 
> I can only find the ones I mentioned earlier in,
> $SGE_ROOT/lib
> and
> $SGE_ROOT/sgeinsp?ect/sgeinspect/modul?es/ext
>
>
>
> Does this mean that the installation is somehow wrong or incomplete?
>
>
> cheers
> fred
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=225790
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>   


-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Andre Alefeld                Phone: ++49 (0)941 3075-255
Software Engineering         Fax:   ++49 (0)941 3075-222
Sun Microsystems GmbH
Dr.-Leo-Ritter-Str. 7	     mailto: andre.alefeld at sun.com
D-93049 Regensburg           http://www.sun.com/gridware

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=226126

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list