[GE users] Fair share config, fill-up hosts and max user slots

Stephan Grell - Sun Germany - SSG - Software Engineer stephan.grell at sun.com
Wed Jan 11 08:35:25 GMT 2006


Hi Jean-Paul,

we do have a bug with displaying the sharetree data. I could not find any
issue with the actual sharetree computation. Those test were all successful.

How long did you wait? Did you wait until the scheduler made it run? The
priority
information is only available, when the scheduler is running and after
it has finished
its first run.

Could you give me your entire configuration, that is related to this
problem?

Meaning:
- qstat -prio
- qstat -ext
- user configuration involved
- project configuration involved
- resource configuration qstat -sc
- qconf -sss output
- sharetree config.

Sofar I can not replicate your issue. Did you build your binaries yourself?
Which archs are you using?

Cheers,
Stephan



Jean-Paul Minet wrote On 01/06/06 12:00,:

>Stephan,
>
>Trying to work around the possible fair share bug (is it confirmed?), I am 
>trying to combine functional policy and urgency (wait time only).  I have got 
>the  scheduler config (with slot urgency set to 0) detailed below.  When I do a 
>"qstat -prio", all pending jobs report 0 as "nurg" and "ntckts", whatever their 
>waiting time is.  Is that the expected behavior?
>
>Rgds
>
>Jean-paul
>--------------------
>Output of qconf -ssconf:
>
>algorithm                         default
>...
>maxujobs                          8
>queue_sort_method                 load
>job_load_adjustments              np_load_avg=0.50
>load_adjustment_decay_time        0:7:30
>load_formula                      slots
>schedd_job_info                   true
>flush_submit_sec                  0
>flush_finish_sec                  0
>params                            profile=1
>reprioritize_interval             0:0:0
>halftime                          336
>usage_weight_list                 cpu=0.848000,mem=0.152000,io=0.000000
>compensation_factor               5.000000
>weight_user                       1.00000
>weight_project                    0.000000
>weight_department                 0.000000
>weight_job                        0.000000
>weight_tickets_functional         1000000
>weight_tickets_share              1000000
>share_override_tickets            TRUE
>share_functional_shares           TRUE
>max_functional_jobs_to_schedule   200
>report_pjob_tickets               TRUE
>max_pending_tasks_per_job         50
>halflife_decay_list               none
>policy_hierarchy                  FS
>weight_ticket                     1.000000
>weight_waiting_time               0.010000
>weight_deadline                   3600000.000000
>weight_urgency                    0.010000
>weight_priority                   0.000000
>max_reservation                   0
>default_duration                  0:10:0
>
>
>
>Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
>  
>
>>Hi Jean-Paul,
>>
>>I just did the test with the env you describe. I am sure, that you found 
>>a bug. In my tests, the
>>targeted resource share is allways 0 as you describe it. However, the 
>>actual resource share
>>is reported correctly.
>>
>>Cheers,
>>Stephan
>>
>>Jean-Paul Minet wrote:
>>
>>    
>>
>>>Hi,
>>>
>>>Our bi-proc cluster is used for sequential, OpenMP and MPI jobs.  We 
>>>wish to:
>>>
>>>1) use fair-share scheduling with equal shares for all users
>>>
>>>I have disabled Priority and Urgency scheduling, and set policy 
>>>hierarchy to S.:
>>>
>>>lemaitre ~ # qconf -ssconf
>>>algorithm                         default
>>>...
>>>halftime                          336
>>>usage_weight_list                 cpu=0.848000,mem=0.152000,io=0.000000
>>>...
>>>weight_tickets_functional         0
>>>weight_tickets_share              10000
>>>...
>>>policy_hierarchy                  S
>>>weight_ticket                     1.000000
>>>...
>>>weight_urgency                    0.000000
>>>weight_priority                   0.000000
>>>
>>>Under the share tree policy, I have only defined a default leaf under 
>>>which all users appear, but "Actual resource share" and "Targeted 
>>>resource share" remain 0 for all users, as if actual usage was not 
>>>taken into account?  This is confirmed by jobs being dispatched more 
>>>like in FIFO order than following past usage. What's wrong?
>>>      
>>>
>>
>>    
>>
>>>2) limit the total number of CPUs/slots used by any user at any time: 
>>>MaxJobs/User doesn't help as a single MPI job can use many slots and 
>>>therefore cannot compare to a sequential job.  How can we implement this?
>>>
>>>3) fill-up hosts with sequential jobs to leave as many empty nodes for 
>>>OpenMP and MPI jobs.  I have read Stephen G. WebL Log: am I correct in 
>>>assuming that I have to define a complex_values slots=2 for each of 
>>>the biproc host (we don't want more jobs than CPU) and, thereafter, 
>>>the scheduler will select the hosts with the least available slots 
>>>(setting of course queue_sort_method=load and load_formula=slots) ?
>>>
>>>Thanks for any help
>>>
>>>Jean-Paul
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>      
>>>
>>    
>>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list