[GE users] scheduling weirdness in 6.0u3

Stephan Grell - Sun Germany - SSG - Software Engineer stephan.grell at sun.com
Thu Apr 7 14:38:35 BST 2005



Sean Dilda wrote:

>Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
>  
>
>>Sean Dilda wrote:
>>
>>    
>>
>>>I may try that if things get bad, but I'd much rather see the software 
>>>work as its supposed to.  Did they give any indication as to when the 
>>>problems would be fixed?
>>>
>>>      
>>>
>>Sofar I only noticed a bug for assigning pe jobs. That bug is fixed and will
>>be part of u4.  If you need the fix right away, you can get it from the
>>maintrunk.
>>    
>>
>
>
>Thanks.  I found that in CVS yesterday and tried applying the following 
>patch to 6.0u3:
>
>
>--- source/libs/sched/sge_select_queue.c        13 Dec 2004 14:10:11 
>-0000      1.102
>+++ source/libs/sched/sge_select_queue.c        23 Feb 2005 09:37:37 
>-0000      1.103
>@@ -3057,14 +3057,7 @@
>              if (*previous_load_inited && (*previous_load < 
>lGetDouble(hep, EH_sort_value))) {
>                 (*host_seqno)++;
>              }
>-            else {
>-               if (!previous_load_inited) {
>-                  *previous_load_inited = true;
>-               }
>-               else {
>-                  /* DPRINTF(("SKIP INCREMENTATION OF HOST_SEQNO\n")) */ ;
>-               }
>-            }
>+            *previous_load_inited = true;
>              *previous_load = lGetDouble(hep, EH_sort_value);
>              lSetUlong(qep, QU_host_seq_no, *host_seqno);
>
>@@ -3074,8 +3067,6 @@
>                 suited_as_master_host = true;
>              }
>
>-            /* prepare sort by sequence number of queues */
>-            lSetUlong(qep, QU_host_seq_no, *host_seqno);
>
>              DPRINTF(("QUEUE %s TIME: %d + %d -> %d  QEND: %d + %d -> 
>%d (%d soft violations)\n", qname,
>                 accu_queue_slots,      qslots,      accu_queue_slots+ 
>      qslots,
>
>
>The strange thing is that I'm getting different results on my production 
>cluster and test cluster.  The test cluster was seeing problems only 
>with parallel jobs, and the above patch seems to have fixed it.  The 
>production cluster is having issues with the parallel and non-parallel 
>jobs, and that patch didn't seem to change anything.  I'll do some more 
>testing and let you know if I can figure out why its going haywire.
>
Well, could it be, that you use a slightly different queue and scheduler
configuration? Could you post your configuration? The list might be able
to help you. Are just compare the configurations between the two grids.

>
>On a slightly different note, is there any word on when 6.0u4 might be 
>released?  I notice there's been a number of updates between 6.0u3 and 
>the current maintrunk.
>  
>
We are currently in the test phase and will have u4 ready as soon as our
tests are done. I do not know the exact schedule.

Cheers,
Stephan

>
>Thanks,
>
>
>Sean
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list