Custom Query (431 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (1 - 3 of 431)

1 2 3 4 5 6 7 8 9 10 11
Ticket Resolution Summary Owner Reporter
#1441 fixed SoGE 8.1.2 qmaster segfault problem Dave Love <d.love@…> Andreas.Loong@…
Description

I couldn't get the debug binaries to run properly, however another user sent me the below output - I hope it's enough:

From: baf035 baf035@… Sent: den 2 november 2012 14:34 To: Loong, Andreas Subject: Re: [gridengine users] SoGE 8.1.2 segfault problem

Hi,

I can validate described behaviour: SoGE compiled with -debug, qmaster server system: cat /etc/SuSE-release SUSE Linux Enterprise Server 11 (x86_64) VERSION = 11 PATCHLEVEL = 1 ~# uname -r 2.6.32.59-0.7-xen

SGE_ND=1 gdb -batch -ex run -ex 'bt full' sge_qmaster | tee sge_master_gdb2.log


Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2 Try: zypper install -C "debuginfo(build-id)=c1807b5762068e6c5f4a6a0ed48d9d4469965351" Missing separate debuginfo for /usr/lib64/libssl.so.0.9.8 Try: zypper install -C "debuginfo(build-id)=d18ef9c9ddb90ed79b550ba6399c00874bc86345" Missing separate debuginfo for /usr/lib64/libcrypto.so.0.9.8 Try: zypper install -C "debuginfo(build-id)=abcd98fb64029fea0fc96116be5f178a429e63d5" Missing separate debuginfo for /lib64/libdl.so.2 Try: zypper install -C "debuginfo(build-id)=f607b21f9a513c99bba9539050c01236d19bf22b" Missing separate debuginfo for /lib64/libm.so.6 Try: zypper install -C "debuginfo(build-id)=4e9fa1a2c1141fc0123a142783efd044c40bdaaf" Missing separate debuginfo for /lib64/libpthread.so.0 Try: zypper install -C "debuginfo(build-id)=341d7c595fd2db49df98b8a6ae2c319f46b43c5b" Missing separate debuginfo for /lib64/libc.so.6 Try: zypper install -C "debuginfo(build-id)=9e0264386fde8570b215fd4c32465fdda3c1c996" [Thread debugging using libthread_db enabled] Missing separate debuginfo for /lib64/libz.so.1 Try: zypper install -C "debuginfo(build-id)=4c05d1eb180f9c02b81a0c559c813dada91e0ca4" [New Thread 0x7ffff69ff700 (LWP 31685)] [New Thread 0x7ffff61fe700 (LWP 31686)] [New Thread 0x7ffff59fd700 (LWP 31687)] [New Thread 0x7ffff51fc700 (LWP 31688)] Reading in Master_Job_List.

read job database with 0 entries in 0 seconds error: error opening file "/jms/spool/i005/sge_spool/qmaster/./sharetree" for reading: No such file or directory nr of dynamic event clients exceeds max file descriptor limit, setting MAX_DYN_EC=979 qmaster hard descriptor limit is set to 8192 qmaster soft descriptor limit is set to 1024 qmaster will use max. 1004 file descriptors for communication qmaster will accept max. 979 dynamic event clients starting up SGE 8.1.3pre (lx-amd64) Q:1, AQ:285 J:0(0), H:832(832), C:225, A:13, D:1, P:2, CKPT:1, US:3, PR:9, RQS:0, AR:0, S:nd:0/lf:0


Q:277, AQ:285 J:0(0), H:832(832), C:225, A:13, D:1, P:2, CKPT:1, US:3, PR:9, RQS:0, AR:0, S:nd:0/lf:0


Q:279, AQ:285 J:0(0), H:832(832), C:225, A:13, D:1, P:2, CKPT:1, US:3, PR:9, RQS:0, AR:0, S:nd:0/lf:0


..

:281, AQ:285 J:0(0), H:832(832), C:225, A:13, D:1, P:2, CKPT:1, US:3, PR:9, RQS:0, AR:0, S:nd:0/lf:0


Q:281, AQ:285 J:0(0), H:832(832), C:225, A:13, D:1, P:2, C[New Thread 0x7ffff41ff700 (LWP 31691)] [New Thread 0x7ffff39fe700 (LWP 31692)] [New Thread 0x7ffff31fd700 (LWP 31693)] [New Thread 0x7ffff29fc700 (LWP 31694)] [New Thread 0x7ffff21fb700 (LWP 31695)] [New Thread 0x7ffff19fa700 (LWP 31696)] [New Thread 0x7ffff11f9700 (LWP 31697)] [New Thread 0x7ffff09f8700 (LWP 31698)]

Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff19fa700 (LWP 31696)] 0x00000000004995c6 in do_gdi_packet (monitor=<optimized out>, aMsg=<optimized out>, answer_list=<optimized out>, ctx=<optimized out>) at ../daemons/qmaster/sge_qmaster_process_message.c:195 195 ../daemons/qmaster/sge_qmaster_process_message.c: No such file or directory.

in ../daemons/qmaster/sge_qmaster_process_message.c

#0 0x00000000004995c6 in do_gdi_packet (monitor=<optimized out>, aMsg=<optimized out>, answer_list=<optimized out>, ctx=<optimized out>) at ../daemons/qmaster/sge_qmaster_process_message.c:195

packet = 0x0 local_ret = false SGE_FUNC = "do_gdi_packet"

#1 sge_qmaster_process_message (ctx=0x7ffff42b6800, monitor=<optimized out>) at ../daemons/qmaster/sge_qmaster_process_message.c:158

res = <optimized out> msg = {snd_host = "sget5.hpc.domain.com", '\000' <repeats 40

times>, snd_name = "qstat", '\000' <repeats 58 times>, snd_id = 637, tag

2, request_mid = 2, buf = {head_ptr = 0x7fffefc17800 "", cur_ptr

0x7fffefc1783e "", mem_size = 1241, bytes_used = 62, just_count = 0, version = 268566528}}

SGE_FUNC = "sge_qmaster_process_message"

#2 0x000000000042fb74 in sge_listener_main (arg=<optimized out>) at ../daemons/qmaster/sge_thread_listener.c:169

thread_config = 0x7ffff4693de0 monitor = {thread_name = 0x7ffff42a7360 "listener000",

monitor_time = 0, log_monitor_mes = false, output_line1 = 0x7ffff42ab1a0, output_line2 = 0x7ffff42ab1c0, work_line = 0x7ffff42ab1c0, pos = 6, now = {tv_sec = 0, tv_usec = 0}, output = false, message_in_count = 0, message_out_count = 0, idle = 0, wait = 0, ext_type = LIS_EXT, ext_data = 0x7ffff42a7370, ext_data_size = 16, ext_output = 0x5c8ac0 <ext_lis_output>}

ctx = 0x7ffff42b6800 next_prof_output = 0 SGE_FUNC = "sge_listener_main"

#3 0x00007ffff717b6a6 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00007ffff6eeaf7d in clone () from /lib64/libc.so.6 No symbol table info available. #5 0x0000000000000000 in ?? () No symbol table info available.


Sge_qmaster died randomly without significant load on server:

2012-10-31 18:42:03] sge_qmaster[8181]: segfault at 68 ip

00000000004995c6 sp 00007f40acbf9c50 error 6 in sge_qmaster[400000+267000] [2012-10-31 20:35:03] sge_qmaster[8895]: segfault at 68 ip 00000000004995c6 sp 00007fb8b8dfac50 error 6 in sge_qmaster[400000+267000] [2012-11-01 04:50:03] sge_qmaster[13225]: segfault at 68 ip 00000000004995c6 sp 00007f62cd4f9c50 error 6 in sge_qmaster[400000+267000] [2012-11-01 08:56:04] sge_qmaster[15224]: segfault at 68 ip 00000000004995c6 sp 00007f9b727f9c50 error 6 in sge_qmaster[400000+267000] [2012-11-01 10:56:02] sge_qmaster[16860]: segfault at 68 ip 00000000004995c6 sp 00007f0c183f9c50 error 6 in sge_qmaster[400000+267000]

Bye

BaF035


Confidentiality Notice: This message is private and may contain confidential and proprietary information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this message is not permitted and may be unlawful.

#1526 invalid Bug writing pe_hostfile binding strategy ? Didier.Rebeix@…
Description

Hi there,

Trying to use SGE core binding feature, I'm facing strange binding strategy in the generated pe_hostfile.

If I submit a dmp job with " -binding pe linear:slots " every node in the pe_hostfile seems to get the same binding strategy.

Below are 2 examples of strange pe_hostfiles and corresponding qsub options :

#################### example 1 #################### qsub -q batch -pe dmp* 64 -binding pe linear:slots simple.job

part061.u-bourgogne.fr 12 batch@… 0,0:0,1:0,2:0,3:0,4:0,5:1,0:1,1:1,2:1,3:1,4:1,5 part081.u-bourgogne.fr 12 batch@… 0,0:0,1:0,2:0,3:0,4:0,5:1,0:1,1:1,2:1,3:1,4:1,5 part065.u-bourgogne.fr 12 batch@… 0,0:0,1:0,2:0,3:0,4:0,5:1,0:1,1:1,2:1,3:1,4:1,5 part060.u-bourgogne.fr 11 batch@… 0,0:0,1:0,2:0,3:0,4:0,5:1,0:1,1:1,2:1,3:1,4:1,5 part083.u-bourgogne.fr 10 batch@… 0,0:0,1:0,2:0,3:0,4:0,5:1,0:1,1:1,2:1,3:1,4:1,5 part082.u-bourgogne.fr 7 batch@… 0,0:0,1:0,2:0,3:0,4:0,5:1,0:1,1:1,2:1,3:1,4:1,5

#################### example 2 #################### qsub -q batch -pe dmp* 18 -binding pe linear:slots simple.job

part065.u-bourgogne.fr 6 batch@… 1,0:1,1:1,2:1,3:1,4:1,5 part061.u-bourgogne.fr 12 batch@… 1,0:1,1:1,2:1,3:1,4:1,5

It looks like the first binding strategy for the first host is calculated right but is wrongly applied to all other nodes.

I'm using sge-8.1.8.

All my dmp PEs (1 per IB switch) are configured the same way :

# qconf -sp dmp_swib1 pe_name dmp_swib1 slots 1000 user_lists NONE xuser_lists NONE start_proc_args /usr/ccub/sge/pe/dmp/startdmp.sh -catch_rsh $pe_hostfile stop_proc_args /usr/ccub/sge/pe/dmp/stopdmp.sh allocation_rule $fill_up control_slaves TRUE job_is_first_task FALSE urgency_slots min accounting_summary FALSE qsort_args NONE

Feature or bug ?

Thank !

--

Didier Rebeix

Centre de Calcul et Messageries Université de Bourgogne Maison de l’université Esplanade Erasme - BP 27877 21078 Dijon Cedex

TEL : 03.80.39.52.05 / FAX : 03.80.39.52.69

#1349 fixed RHEL 6 build problems dlove Florian.LaRoche@…
Description
Hello,

I've compiled the current gridengine source rpm with the Red Hat
buildsystem koji and put the rpm packages here:

    http://jur-linux.org/rpms/el-updates/5/
    http://jur-linux.org/rpms/el-updates/6/

For RHEL6 the following cludgy changes did make it compile:
--- gridengine.spec
+++ gridengine.spec
@@ -152,7 +152,7 @@

 # -O2/-O3 gives warnings about type puns.  It's not clear whether
 # they're serious, but -fno-strict-aliasing just in case.
-export SGE_INPUT_CFLAGS="$RPM_OPT_FLAGS -fno-strict-aliasing"
+export SGE_INPUT_CFLAGS="$RPM_OPT_FLAGS -fno-strict-aliasing -I /usr/include/freetype2"
 [ -n "$RPM_BUILD_NCPUS" ] && parallel_flags="-parallel $RPM_BUILD_NCPUS"
 %if %{without java}
 JAVA_BUILD_OPTIONS="-no-java -no-jni"
@@ -182,8 +182,8 @@
   rm man/man8/SGE_Helper_Service.exe.8
   rm -r util/gui-installer util/sgeSMF
   rm start_gui_installer
-  for l in lib/*/libdrmaa.so.1; do
-    ( cd $(dirname $l); ln -sf libdrmaa.so.1 libdrmaa.so; )
+  for l in lib/*/libdrmaa.so.1.0; do
+    ( cd $(dirname $l); ln -sf libdrmaa.so.1.0 libdrmaa.so; )
   done
   gzip man/man*/*
 )


Please let me know if you run these rpms and they work out ok for you.

best regards,

Florian La Roche

1 2 3 4 5 6 7 8 9 10 11
Note: See TracQuery for help on using queries.