[GE users] scheduling weirdness in 6.0u3

Sean Dilda agrajag at dragaera.net
Thu Apr 7 14:23:50 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
> 
> Sean Dilda wrote:
> 
> >>>
>>
>>I may try that if things get bad, but I'd much rather see the software 
>>work as its supposed to.  Did they give any indication as to when the 
>>problems would be fixed?
>>
> 
> Sofar I only noticed a bug for assigning pe jobs. That bug is fixed and will
> be part of u4.  If you need the fix right away, you can get it from the
> maintrunk.


Thanks.  I found that in CVS yesterday and tried applying the following 
patch to 6.0u3:


--- source/libs/sched/sge_select_queue.c        13 Dec 2004 14:10:11 
-0000      1.102
+++ source/libs/sched/sge_select_queue.c        23 Feb 2005 09:37:37 
-0000      1.103
@@ -3057,14 +3057,7 @@
              if (*previous_load_inited && (*previous_load < 
lGetDouble(hep, EH_sort_value))) {
                 (*host_seqno)++;
              }
-            else {
-               if (!previous_load_inited) {
-                  *previous_load_inited = true;
-               }
-               else {
-                  /* DPRINTF(("SKIP INCREMENTATION OF HOST_SEQNO\n")) */ ;
-               }
-            }
+            *previous_load_inited = true;
              *previous_load = lGetDouble(hep, EH_sort_value);
              lSetUlong(qep, QU_host_seq_no, *host_seqno);

@@ -3074,8 +3067,6 @@
                 suited_as_master_host = true;
              }

-            /* prepare sort by sequence number of queues */
-            lSetUlong(qep, QU_host_seq_no, *host_seqno);

              DPRINTF(("QUEUE %s TIME: %d + %d -> %d  QEND: %d + %d -> 
%d (%d soft violations)\n", qname,
                 accu_queue_slots,      qslots,      accu_queue_slots+ 
      qslots,


The strange thing is that I'm getting different results on my production 
cluster and test cluster.  The test cluster was seeing problems only 
with parallel jobs, and the above patch seems to have fixed it.  The 
production cluster is having issues with the parallel and non-parallel 
jobs, and that patch didn't seem to change anything.  I'll do some more 
testing and let you know if I can figure out why its going haywire.

On a slightly different note, is there any word on when 6.0u4 might be 
released?  I notice there's been a number of updates between 6.0u3 and 
the current maintrunk.


Thanks,


Sean

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list