[GE dev] Review for 3 man CRs

mpospisil michael.pospisil at sun.com
Wed Jul 8 12:53:22 BST 2009


Ok, this seems fine to me. Please check in.

Michael

Jan Forch wrote:

> On 07/07/09 21:29, Michael Pospisil - Sun Microsystems - Prague Czech 
> Republic wrote:
>
>> Hello Honza,
>> please see notes below.
>>
>> Michael
>>
>>
>> Jan Forch wrote:
>>
>>> Hi team,
>>> please could someone (preferably Michael ;-) ) do review of few man 
>>> CRs for me:
>>>
>>> 1)
>>> CR 6806918 Man page entry for SCHEDULER_TIMEOUT incorrectly states 
>>> that default is 600
>>> -man page sge_conf.5
>>>
>>> cvs diff sge_conf.5
>>> Index: sge_conf.5
>>> ===================================================================
>>> RCS file: /cvs/gridengine/doc/man/man5/sge_conf.5,v
>>> retrieving revision 1.87
>>> diff -r1.87 sge_conf.5
>>> 1186,1188c1186,1189
>>> < Setting this parameter allows the scheduler GDI event acknowledge 
>>> timeout to be manually configured to a
>>> < specific value. Currently the default value is set to 10 minutes. 
>>> The \fISCHEDULER_TIMEOUT\fP value is
>>> < specified in seconds.
>>> ---
>>>  > Setting this parameter allows the scheduler GDI event acknowledge 
>>> timeout to be manually configured to a
>>>  > specific value. Currently the default value is 10 minutes with 
>>> default scheduler configuration, capped
>>>  > between 600 and 1200 seconds. But the default value depends on 
>>> current scheduler configuration. The
>>>  > The \fISCHEDULER_TIMEOUT\fP value is specified in seconds.
>>
>>
>> reworded a few things:
>>
>> Setting this parameter allows the scheduler GDI event acknowledge 
>> timeout to be manually configured to a
>> specific value. Currently the default value is 10 minutes with the 
>> default scheduler configuration and limited between 600 and 1200 
>> seconds. The default value depends on the current scheduler 
>> configuration. The \fISCHEDULER_TIMEOUT\fP value is specified in 
>> seconds.
>>
>>
>> But this CR is still not very clear for me. In the description it 
>> states that the timeout value is capped between 600 and 1200, yet in 
>> the comments it is written that the timeout is not restricted between 
>> 600 and 1200.
>>
>> Which one is correct??
>>
>       /* if set, use qmaster_params SCHEDULER_TIMEOUT */
>       if (ec_id == EV_ID_SCHEDD && scheduler_timeout > 0) {
>          timeout = scheduler_timeout;
>       } else {
>          /* is the ack timeout expired ? */
>          timeout = 10 * deliver_interval;
>                  if (timeout < EVENT_ACK_MIN_TIMEOUT) {
>             timeout = EVENT_ACK_MIN_TIMEOUT;
>          } else if (timeout > EVENT_ACK_MAX_TIMEOUT) {
>             timeout = EVENT_ACK_MAX_TIMEOUT;
>          }
>       }
>
> If I understand it correctly. Value is limited only in case 
> SCHEDULER_TIMEOUT is not set explicitly (default value). I would say 
> that comment intends
> to emphasize  that value is limited only in case of  default value. I 
> can add this explicitly. Something like:
>
> Setting this parameter allows the scheduler GDI event acknowledge 
> timeout to be manually configured to a
> specific value. Currently the default value is 10 minutes with the 
> default scheduler configuration and limited
> between 600 and 1200 seconds. Value is limited only in case of  
> default value. The default value depends
> on the current scheduler configuration. The \fISCHEDULER_TIMEOUT\fP 
> value is specified in seconds.
>
>>>
>>> 2)
>>> CR 6786258 Man page should mention reprioritize_interval is coupled 
>>> to scheduler_interval
>>> -man page sched_conf.5
>>>
>>> cvs diff sched_conf.5
>>> Index: sched_conf.5
>>> ===================================================================
>>> RCS file: /cvs/gridengine/doc/man/man5/sched_conf.5,v
>>> retrieving revision 1.38
>>> diff -r1.38 sched_conf.5
>>> 410a411,417
>>>  > The reprioritization tickets are calculated by the scheduler and 
>>> update events
>>>  > for running jobs are only sent after the scheduler calculated new 
>>> values. How often
>>>  > the schedule should calculate the tickets is defined by 
>>> reprioritize_interval.
>>>  > Because the scheduler is only triggered in a specific interval 
>>> (scheduler_interval)
>>>  > this means the reprioritize_interval has only a meaning if set 
>>> greater than scheduler_interval.
>>>  > For example, if the scheduler_interval is 2 minutes and 
>>> reprioritize_interval is set
>>>  > to 10 seconds, this means the jobs get re-prioritized every 2 
>>> minutes.
>>>
>> "The" is missing in a few places...
>>
>> The reprioritization tickets are calculated by the scheduler and 
>> update events
>> for running jobs are only sent after the scheduler calculated new 
>> values. How often
>> the schedule should calculate the tickets is defined by the 
>> reprioritize_interval.
>> Because the scheduler is only triggered in a specific interval 
>> (scheduler_interval)
>> this means the reprioritize_interval has only a meaning if set 
>> greater than the scheduler_interval.
>> For example, if the scheduler_interval is 2 minutes and 
>> reprioritize_interval is set
>> to 10 seconds, this means the jobs get re-prioritized every 2 minutes.
>>
>>> 3)
>>> CR 6291037 Relationship between suspend_threshold and 
>>> scheduler_interval needs to be documented
>>> -man page queue_conf.5
>>>
>>> cvs diff queue_conf.5
>>> Index: queue_conf.5
>>> ===================================================================
>>> RCS file: /cvs/gridengine/doc/man/man5/queue_conf.5,v
>>> retrieving revision 1.33
>>> diff -r1.33 queue_conf.5
>>> 151c151,156
>>> < jobs which are suspended.
>>> ---
>>>  > jobs which are suspended. There is an important relationship between
>>>  > \fsuspend_threshold\fP and \fscheduler_interval\fP. If you have 
>>> for example
>>>  > a suspend threshold on the np_load_avg, and the load exceeds the 
>>> threshold,
>>>  > this does not have immediate effect. Jobs continue running until 
>>> the next
>>>  > scheduling run, where scheduler detects the threshold has been 
>>> exceeded and
>>>  > sends an order to qmaster to suspend the job. Same for 
>>> unsuspending again.
>>>
>> just a few minor changes:
>>
>>
>> jobs which are suspended. There is an important relationship between the
>> \fsuspend_threshold\fP and the \fscheduler_interval\fP. If you have 
>> for example
>> a suspend threshold on the np_load_avg, and the load exceeds the 
>> threshold,
>> this does not have immediate effect. Jobs continue running until the 
>> next
>> scheduling run, where the scheduler detects the threshold has been 
>> exceeded and
>> sends an order to qmaster to suspend the job. The same applies for 
>> unsuspending.
>>
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=39&dsMessageId=206143

To unsubscribe from this discussion, e-mail: [dev-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list