Custom Query (431 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (73 - 75 of 431)

Ticket Resolution Summary Owner Reporter
#1432 fixed qstat -s z shows jobs as waiting Dave Love <d.love@…> dlove
Description

qstat -s z shows the jobs' status is show as "qw", which is at least confusing. Perhaps zombie jobs should have a JFINISHED status?

#1436 fixed man jsv_script_interface shows wrong call in examples section Dave Love <d.love@…> Reuti
Description

The man page and html document contain at the end:

              function call                      returned value
              ------------------------------------------------------
              jsv_is_param(l_hard)               "true"
              jsv_get_param(l_hard)              "mem=1G,mem2=200M"
              jsv_sub_is_param(l_hard,mem)       "true"
              jsv_sub_get_param(l_hard,mem)      "1G"
              jsv_sub_get_param(l_hard,mem3)     "false"
              jsv_sub_get_param(l_hard,mem3)     ""

Most likely the second to last line should read jsv_sub_is_param instead of jsv_sub_get_param.

#1441 fixed SoGE 8.1.2 qmaster segfault problem Dave Love <d.love@…> Andreas.Loong@…
Description

I couldn't get the debug binaries to run properly, however another user sent me the below output - I hope it's enough:

From: baf035 baf035@… Sent: den 2 november 2012 14:34 To: Loong, Andreas Subject: Re: [gridengine users] SoGE 8.1.2 segfault problem

Hi,

I can validate described behaviour: SoGE compiled with -debug, qmaster server system: cat /etc/SuSE-release SUSE Linux Enterprise Server 11 (x86_64) VERSION = 11 PATCHLEVEL = 1 ~# uname -r 2.6.32.59-0.7-xen

SGE_ND=1 gdb -batch -ex run -ex 'bt full' sge_qmaster | tee sge_master_gdb2.log


Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2 Try: zypper install -C "debuginfo(build-id)=c1807b5762068e6c5f4a6a0ed48d9d4469965351" Missing separate debuginfo for /usr/lib64/libssl.so.0.9.8 Try: zypper install -C "debuginfo(build-id)=d18ef9c9ddb90ed79b550ba6399c00874bc86345" Missing separate debuginfo for /usr/lib64/libcrypto.so.0.9.8 Try: zypper install -C "debuginfo(build-id)=abcd98fb64029fea0fc96116be5f178a429e63d5" Missing separate debuginfo for /lib64/libdl.so.2 Try: zypper install -C "debuginfo(build-id)=f607b21f9a513c99bba9539050c01236d19bf22b" Missing separate debuginfo for /lib64/libm.so.6 Try: zypper install -C "debuginfo(build-id)=4e9fa1a2c1141fc0123a142783efd044c40bdaaf" Missing separate debuginfo for /lib64/libpthread.so.0 Try: zypper install -C "debuginfo(build-id)=341d7c595fd2db49df98b8a6ae2c319f46b43c5b" Missing separate debuginfo for /lib64/libc.so.6 Try: zypper install -C "debuginfo(build-id)=9e0264386fde8570b215fd4c32465fdda3c1c996" [Thread debugging using libthread_db enabled] Missing separate debuginfo for /lib64/libz.so.1 Try: zypper install -C "debuginfo(build-id)=4c05d1eb180f9c02b81a0c559c813dada91e0ca4" [New Thread 0x7ffff69ff700 (LWP 31685)] [New Thread 0x7ffff61fe700 (LWP 31686)] [New Thread 0x7ffff59fd700 (LWP 31687)] [New Thread 0x7ffff51fc700 (LWP 31688)] Reading in Master_Job_List.

read job database with 0 entries in 0 seconds error: error opening file "/jms/spool/i005/sge_spool/qmaster/./sharetree" for reading: No such file or directory nr of dynamic event clients exceeds max file descriptor limit, setting MAX_DYN_EC=979 qmaster hard descriptor limit is set to 8192 qmaster soft descriptor limit is set to 1024 qmaster will use max. 1004 file descriptors for communication qmaster will accept max. 979 dynamic event clients starting up SGE 8.1.3pre (lx-amd64) Q:1, AQ:285 J:0(0), H:832(832), C:225, A:13, D:1, P:2, CKPT:1, US:3, PR:9, RQS:0, AR:0, S:nd:0/lf:0


Q:277, AQ:285 J:0(0), H:832(832), C:225, A:13, D:1, P:2, CKPT:1, US:3, PR:9, RQS:0, AR:0, S:nd:0/lf:0


Q:279, AQ:285 J:0(0), H:832(832), C:225, A:13, D:1, P:2, CKPT:1, US:3, PR:9, RQS:0, AR:0, S:nd:0/lf:0


..

:281, AQ:285 J:0(0), H:832(832), C:225, A:13, D:1, P:2, CKPT:1, US:3, PR:9, RQS:0, AR:0, S:nd:0/lf:0


Q:281, AQ:285 J:0(0), H:832(832), C:225, A:13, D:1, P:2, C[New Thread 0x7ffff41ff700 (LWP 31691)] [New Thread 0x7ffff39fe700 (LWP 31692)] [New Thread 0x7ffff31fd700 (LWP 31693)] [New Thread 0x7ffff29fc700 (LWP 31694)] [New Thread 0x7ffff21fb700 (LWP 31695)] [New Thread 0x7ffff19fa700 (LWP 31696)] [New Thread 0x7ffff11f9700 (LWP 31697)] [New Thread 0x7ffff09f8700 (LWP 31698)]

Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff19fa700 (LWP 31696)] 0x00000000004995c6 in do_gdi_packet (monitor=<optimized out>, aMsg=<optimized out>, answer_list=<optimized out>, ctx=<optimized out>) at ../daemons/qmaster/sge_qmaster_process_message.c:195 195 ../daemons/qmaster/sge_qmaster_process_message.c: No such file or directory.

in ../daemons/qmaster/sge_qmaster_process_message.c

#0 0x00000000004995c6 in do_gdi_packet (monitor=<optimized out>, aMsg=<optimized out>, answer_list=<optimized out>, ctx=<optimized out>) at ../daemons/qmaster/sge_qmaster_process_message.c:195

packet = 0x0 local_ret = false SGE_FUNC = "do_gdi_packet"

#1 sge_qmaster_process_message (ctx=0x7ffff42b6800, monitor=<optimized out>) at ../daemons/qmaster/sge_qmaster_process_message.c:158

res = <optimized out> msg = {snd_host = "sget5.hpc.domain.com", '\000' <repeats 40

times>, snd_name = "qstat", '\000' <repeats 58 times>, snd_id = 637, tag

2, request_mid = 2, buf = {head_ptr = 0x7fffefc17800 "", cur_ptr

0x7fffefc1783e "", mem_size = 1241, bytes_used = 62, just_count = 0, version = 268566528}}

SGE_FUNC = "sge_qmaster_process_message"

#2 0x000000000042fb74 in sge_listener_main (arg=<optimized out>) at ../daemons/qmaster/sge_thread_listener.c:169

thread_config = 0x7ffff4693de0 monitor = {thread_name = 0x7ffff42a7360 "listener000",

monitor_time = 0, log_monitor_mes = false, output_line1 = 0x7ffff42ab1a0, output_line2 = 0x7ffff42ab1c0, work_line = 0x7ffff42ab1c0, pos = 6, now = {tv_sec = 0, tv_usec = 0}, output = false, message_in_count = 0, message_out_count = 0, idle = 0, wait = 0, ext_type = LIS_EXT, ext_data = 0x7ffff42a7370, ext_data_size = 16, ext_output = 0x5c8ac0 <ext_lis_output>}

ctx = 0x7ffff42b6800 next_prof_output = 0 SGE_FUNC = "sge_listener_main"

#3 0x00007ffff717b6a6 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00007ffff6eeaf7d in clone () from /lib64/libc.so.6 No symbol table info available. #5 0x0000000000000000 in ?? () No symbol table info available.


Sge_qmaster died randomly without significant load on server:

2012-10-31 18:42:03] sge_qmaster[8181]: segfault at 68 ip

00000000004995c6 sp 00007f40acbf9c50 error 6 in sge_qmaster[400000+267000] [2012-10-31 20:35:03] sge_qmaster[8895]: segfault at 68 ip 00000000004995c6 sp 00007fb8b8dfac50 error 6 in sge_qmaster[400000+267000] [2012-11-01 04:50:03] sge_qmaster[13225]: segfault at 68 ip 00000000004995c6 sp 00007f62cd4f9c50 error 6 in sge_qmaster[400000+267000] [2012-11-01 08:56:04] sge_qmaster[15224]: segfault at 68 ip 00000000004995c6 sp 00007f9b727f9c50 error 6 in sge_qmaster[400000+267000] [2012-11-01 10:56:02] sge_qmaster[16860]: segfault at 68 ip 00000000004995c6 sp 00007f0c183f9c50 error 6 in sge_qmaster[400000+267000]

Bye

BaF035


Confidentiality Notice: This message is private and may contain confidential and proprietary information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this message is not permitted and may be unlawful.

Note: See TracQuery for help on using queries.