[GE users] scheduling weirdness in 6.0u3

Sean Dilda agrajag at dragaera.net
Thu Apr 7 15:54:42 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
> 
> Sean Dilda wrote:
> 
>>
>>
>>The strange thing is that I'm getting different results on my production 
>>cluster and test cluster.  The test cluster was seeing problems only 
>>with parallel jobs, and the above patch seems to have fixed it.  The 
>>production cluster is having issues with the parallel and non-parallel 
>>jobs, and that patch didn't seem to change anything.  I'll do some more 
>>testing and let you know if I can figure out why its going haywire.
>>
> 
> Well, could it be, that you use a slightly different queue and scheduler
> configuration? Could you post your configuration? The list might be able
> to help you. Are just compare the configurations between the two grids.

After continued failure to reproduce the error on my test cluster, I 
started to wonder if cluster size was an issue.  The test cluster has 8 
compute nodes.  The production cluster has over 300 execute hosts 
registered.

On a hunch, I changed the schedule_interval on the production cluster 
from 0:0:15 to 0:1:15.  I'm hesitant to call things fixed, but I haven't 
seen the error on the production cluster since I made that change.  This 
makes me wonder if sge_schedd has some kind of timeout for its 
scheduling run that is tied to the schedule_interval.  Although even if 
it does, this doesn't seem quite right as sge_schedd was hardly using 
any cpu time, even with the 15 second schedule_interval.  And when I 
temporarily turned on profiling for schedd, it was reporting only a 
little over one second to do a run.

Thanks,


Sean

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list