[GE dev] Review for 3 man CRs

jf222792 jan.forch at sun.com
Wed Jul 8 11:53:54 BST 2009


On 07/07/09 21:29, Michael Pospisil - Sun Microsystems - Prague Czech 
Republic wrote:
> Hello Honza,
> please see notes below.
>
> Michael
>
>
> Jan Forch wrote:
>
>> Hi team,
>> please could someone (preferably Michael ;-) ) do review of few man 
>> CRs for me:
>>
>> 1)
>> CR 6806918 Man page entry for SCHEDULER_TIMEOUT incorrectly states 
>> that default is 600
>> -man page sge_conf.5
>>
>> cvs diff sge_conf.5
>> Index: sge_conf.5
>> ===================================================================
>> RCS file: /cvs/gridengine/doc/man/man5/sge_conf.5,v
>> retrieving revision 1.87
>> diff -r1.87 sge_conf.5
>> 1186,1188c1186,1189
>> < Setting this parameter allows the scheduler GDI event acknowledge 
>> timeout to be manually configured to a
>> < specific value. Currently the default value is set to 10 minutes. 
>> The \fISCHEDULER_TIMEOUT\fP value is
>> < specified in seconds.
>> ---
>>  > Setting this parameter allows the scheduler GDI event acknowledge 
>> timeout to be manually configured to a
>>  > specific value. Currently the default value is 10 minutes with 
>> default scheduler configuration, capped
>>  > between 600 and 1200 seconds. But the default value depends on 
>> current scheduler configuration. The
>>  > The \fISCHEDULER_TIMEOUT\fP value is specified in seconds.
>
> reworded a few things:
>
> Setting this parameter allows the scheduler GDI event acknowledge 
> timeout to be manually configured to a
> specific value. Currently the default value is 10 minutes with the 
> default scheduler configuration and limited between 600 and 1200 
> seconds. The default value depends on the current scheduler 
> configuration. The \fISCHEDULER_TIMEOUT\fP value is specified in seconds.
>
>
> But this CR is still not very clear for me. In the description it 
> states that the timeout value is capped between 600 and 1200, yet in 
> the comments it is written that the timeout is not restricted between 
> 600 and 1200.
>
> Which one is correct??
>
      /* if set, use qmaster_params SCHEDULER_TIMEOUT */
      if (ec_id == EV_ID_SCHEDD && scheduler_timeout > 0) {
         timeout = scheduler_timeout;
      } else {
         /* is the ack timeout expired ? */
         timeout = 10 * deliver_interval;
        
         if (timeout < EVENT_ACK_MIN_TIMEOUT) {
            timeout = EVENT_ACK_MIN_TIMEOUT;
         } else if (timeout > EVENT_ACK_MAX_TIMEOUT) {
            timeout = EVENT_ACK_MAX_TIMEOUT;
         }
      }

If I understand it correctly. Value is limited only in case 
SCHEDULER_TIMEOUT is not set explicitly (default value). I would say 
that comment intends
to emphasize  that value is limited only in case of  default value. I 
can add this explicitly. Something like:

Setting this parameter allows the scheduler GDI event acknowledge 
timeout to be manually configured to a
specific value. Currently the default value is 10 minutes with the 
default scheduler configuration and limited
between 600 and 1200 seconds. Value is limited only in case of  default 
value. The default value depends
on the current scheduler configuration. The \fISCHEDULER_TIMEOUT\fP 
value is specified in seconds.

>>
>> 2)
>> CR 6786258 Man page should mention reprioritize_interval is coupled 
>> to scheduler_interval
>> -man page sched_conf.5
>>
>> cvs diff sched_conf.5
>> Index: sched_conf.5
>> ===================================================================
>> RCS file: /cvs/gridengine/doc/man/man5/sched_conf.5,v
>> retrieving revision 1.38
>> diff -r1.38 sched_conf.5
>> 410a411,417
>>  > The reprioritization tickets are calculated by the scheduler and 
>> update events
>>  > for running jobs are only sent after the scheduler calculated new 
>> values. How often
>>  > the schedule should calculate the tickets is defined by 
>> reprioritize_interval.
>>  > Because the scheduler is only triggered in a specific interval 
>> (scheduler_interval)
>>  > this means the reprioritize_interval has only a meaning if set 
>> greater than scheduler_interval.
>>  > For example, if the scheduler_interval is 2 minutes and 
>> reprioritize_interval is set
>>  > to 10 seconds, this means the jobs get re-prioritized every 2 
>> minutes.
>>
> "The" is missing in a few places...
>
> The reprioritization tickets are calculated by the scheduler and 
> update events
> for running jobs are only sent after the scheduler calculated new 
> values. How often
> the schedule should calculate the tickets is defined by the 
> reprioritize_interval.
> Because the scheduler is only triggered in a specific interval 
> (scheduler_interval)
> this means the reprioritize_interval has only a meaning if set greater 
> than the scheduler_interval.
> For example, if the scheduler_interval is 2 minutes and 
> reprioritize_interval is set
> to 10 seconds, this means the jobs get re-prioritized every 2 minutes.
>
>> 3)
>> CR 6291037 Relationship between suspend_threshold and 
>> scheduler_interval needs to be documented
>> -man page queue_conf.5
>>
>> cvs diff queue_conf.5
>> Index: queue_conf.5
>> ===================================================================
>> RCS file: /cvs/gridengine/doc/man/man5/queue_conf.5,v
>> retrieving revision 1.33
>> diff -r1.33 queue_conf.5
>> 151c151,156
>> < jobs which are suspended.
>> ---
>>  > jobs which are suspended. There is an important relationship between
>>  > \fsuspend_threshold\fP and \fscheduler_interval\fP. If you have 
>> for example
>>  > a suspend threshold on the np_load_avg, and the load exceeds the 
>> threshold,
>>  > this does not have immediate effect. Jobs continue running until 
>> the next
>>  > scheduling run, where scheduler detects the threshold has been 
>> exceeded and
>>  > sends an order to qmaster to suspend the job. Same for 
>> unsuspending again.
>>
> just a few minor changes:
>
>
> jobs which are suspended. There is an important relationship between the
> \fsuspend_threshold\fP and the \fscheduler_interval\fP. If you have 
> for example
> a suspend threshold on the np_load_avg, and the load exceeds the 
> threshold,
> this does not have immediate effect. Jobs continue running until the next
> scheduling run, where the scheduler detects the threshold has been 
> exceeded and
> sends an order to qmaster to suspend the job. The same applies for 
> unsuspending.
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=39&dsMessageId=206130

To unsubscribe from this discussion, e-mail: [dev-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list