[GE issues] [Issue 3233] New - slotwise preemption fails to unsuspend one job per host

stephendennis sdennis at univaud.com
Wed Jan 27 20:59:56 GMT 2010


http://gridengine.sunsource.net/issues/show_bug.cgi?id=3233
                 Issue #|3233
                 Summary|slotwise preemption fails to unsuspend one job per hos
                        |t
               Component|gridengine
                 Version|current
                Platform|All
                     URL|
              OS/Version|All
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P3
            Subcomponent|scheduling
             Assigned to|andreas
             Reported by|stephendennis






------- Additional comments from stephendennis at sunsource.net Wed Jan 27 12:59:52 -0800 2010 -------
The following simple sequence leaves one job per host
suspended after superordinate jobs have completed.

There is a terminal capture demonstrating the bug after
the following details. 

Changes to the subordinate_list in the form
subordinate_list      slots=4(low:0:sr), \
                        [@2slot=slots=2(low:0:sr)], \
                       [@4slot=slots=4(low:0:sr)]
show the same flaw.

Bug also demonstrable with multiple execution hosts.

 sd at pursuit:~$ qstat -help | head -1
 SGE 6.2u5
 sd at pursuit:~$ qconf -sq low |grep -v INFIN|grep -v NONE
 qname                 low
 hostlist              @allhosts
 seq_no                0
 load_thresholds       np_load_avg=1.75
 nsuspend              1
 suspend_interval      00:05:00
 priority              0
 min_cpu_interval      00:05:00
 processors            UNDEFINED
 qtype                 BATCH INTERACTIVE
 pe_list               make
 rerun                 FALSE
 slots                 3
 tmpdir                /tmp
 shell                 /bin/csh
 shell_start_mode      posix_compliant
 notify                00:00:60
 initial_state         default
 sd at pursuit:~$ qconf -sq high |grep -v INFIN|grep -v NONE
 qname                 high
 hostlist              @allhosts
 seq_no                0
 load_thresholds       np_load_avg=1.75
 nsuspend              1
 suspend_interval      00:05:00
 priority              0
 min_cpu_interval      00:05:00
 processors            UNDEFINED
 qtype                 BATCH INTERACTIVE
 pe_list               make
 rerun                 FALSE
 slots                 3
 tmpdir                /tmp
 shell                 /bin/csh
 shell_start_mode      posix_compliant
 notify                00:00:60
 subordinate_list      slots=3(low:1:sr)
 initial_state         default
 sd at pursuit:~$ qstat -f
 queuename                      qtype resv/used/tot. load_avg arch          states
 ---------------------------------------------------------------------------------
 all.q at pursuit                  BIP   0/0/2          0.22     lx24-amd64
 ---------------------------------------------------------------------------------
 high at pursuit                   BIP   0/0/3          0.22     lx24-amd64
 ---------------------------------------------------------------------------------
 low at pursuit                    BIP   0/0/3          0.22     lx24-amd64
 sd at pursuit:~$ for i in `seq 1 3` ; do qsub -b y -q low sleep 1000; done
 Your job 14 ("sleep") has been submitted
 Your job 15 ("sleep") has been submitted
 Your job 16 ("sleep") has been submitted
 sd at pursuit:~$ for i in `seq 1 9` ; do qsub -b y -q high sleep 30; done
 Your job 17 ("sleep") has been submitted
 Your job 18 ("sleep") has been submitted
 Your job 19 ("sleep") has been submitted
 Your job 20 ("sleep") has been submitted
 Your job 21 ("sleep") has been submitted
 Your job 22 ("sleep") has been submitted
 Your job 23 ("sleep") has been submitted
 Your job 24 ("sleep") has been submitted
 Your job 25 ("sleep") has been submitted
 sd at pursuit:~$ qstat
 job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
 -----------------------------------------------------------------------------------------------------------------
      14 0.55500 sleep      sd           S     12/03/2009 16:43:23 low at pursuit                        1
      15 0.55500 sleep      sd           S     12/03/2009 16:43:23 low at pursuit                        1
      16 0.55500 sleep      sd           S     12/03/2009 16:43:23 low at pursuit                        1
      17 0.55500 sleep      sd           r     12/03/2009 16:43:55 high at pursuit                       1
      18 0.55500 sleep      sd           r     12/03/2009 16:43:55 high at pursuit                       1
      19 0.55500 sleep      sd           r     12/03/2009 16:43:55 high at pursuit                       1
      20 0.00000 sleep      sd           qw    12/03/2009 16:43:52                                    1
      21 0.00000 sleep      sd           qw    12/03/2009 16:43:52                                    1
      22 0.00000 sleep      sd           qw    12/03/2009 16:43:52                                    1
      23 0.00000 sleep      sd           qw    12/03/2009 16:43:52                                    1
      24 0.00000 sleep      sd           qw    12/03/2009 16:43:52                                    1
      25 0.00000 sleep      sd           qw    12/03/2009 16:43:52                                    1
 sd at pursuit:~$ sleep 300;qstat
 sd at pursuit:~$ qstat
 job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
 -----------------------------------------------------------------------------------------------------------------
      14 0.55500 sleep      sd           S     12/03/2009 16:43:23 low at pursuit                        1
      15 0.55500 sleep      sd           r     12/03/2009 16:43:23 low at pursuit                        1
      16 0.55500 sleep      sd           r     12/03/2009 16:43:23 low at pursuit                        1
 sd at pursuit:~$ sleep 300;qstat
 job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
 -----------------------------------------------------------------------------------------------------------------
      14 0.55500 sleep      sd           S     12/03/2009 16:43:23 low at pursuit                        1
      15 0.55500 sleep      sd           r     12/03/2009 16:43:23 low at pursuit                        1
      16 0.55500 sleep      sd           r     12/03/2009 16:43:23 low at pursuit                        1

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=241356

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list