[GE users] Fair share config, fill-up hosts and max user slots

Stephan Grell - Sun Germany - SSG - Software Engineer stephan.grell at sun.com
Wed Jan 11 15:06:42 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]



Jean-Paul Minet wrote On 01/11/06 10:51,:

>Stephan,
>
>I am a bit puzzled... in order to overcome the apparent bug we are discussing 
>(fair share usage not taken into account), I have defined a functionnal tree 
>(all users with equal n° of shares).  Since then... it seems usage is accounted 
>for in the fair share tree/policy (actual resource share and combined usage are 
>calculated and displayed properly; targeted usage remains 0)!!!  I am not sure 
>of the causality between the two, but while I repeatedly had O as value for 
>stckt (with qstat -ext), this is no longer the case and I am not aware of having 
>done anything else in the config which could impact on the faire share 
>policy/usage (besides, I indeed also set enforce_project to false in the cluster 
>config [it was set to true before], and set the default project to NONE for all 
>users, whereas it was set to specific projects before)...
>
>Now, there is still a problem with usage accounting, as we have an infiniband 
>interconnect, and the tight integration doesn't work with MPI jobs.  I have 
>looked in the How-To's and found a package for IBA tight integration, but the 
>installed version of mvapich is earlier the one required to apply the patch on 
>the mpirun_rsh.c.  So for those jobs, resources are not accounted for 
>adequately... but this is a wider problem than SGE itself.  I am working on 
>getting mvapich upgraded.
>
>Finally, the question I had for the pending job remains valid: if some jobs are 
>in the scheduler waiting area (not being dispatched because max_u_job is 
>reached, or because resources are not available), shouldn't the scheduler also 
>display tickets/urgency and priority information for those jobs?
>
Well, that is a difficult discussion. We have changed the behavior you
ask to the
current one after we released SGE 6. We thought, that the tickets should
reflect
the dispatch order. Since some jobs will not be dispatched, because they
are in
hold state or the user has exceeded his limits, the job will not be
dispatched, and
therefore, will not get any tickets or priority information.

Cheers,
Stephan

>
>Thnks again for your help
>
>Jean-Paul
>
>Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
>  
>
>>Hi Jean-Paul,
>>
>>we do have a bug with displaying the sharetree data. I could not find any
>>issue with the actual sharetree computation. Those test were all successful.
>>
>>How long did you wait? Did you wait until the scheduler made it run? The
>>priority
>>information is only available, when the scheduler is running and after
>>it has finished
>>its first run.
>>
>>Could you give me your entire configuration, that is related to this
>>problem?
>>
>>Meaning:
>>- qstat -prio
>>- qstat -ext
>>- user configuration involved
>>- project configuration involved
>>- resource configuration qstat -sc
>>- qconf -sss output
>>- sharetree config.
>>
>>Sofar I can not replicate your issue. Did you build your binaries yourself?
>>Which archs are you using?
>>
>>Cheers,
>>Stephan
>>
>>
>>
>>Jean-Paul Minet wrote On 01/06/06 12:00,:
>>
>>
>>    
>>
>>>Stephan,
>>>
>>>Trying to work around the possible fair share bug (is it confirmed?), I am 
>>>trying to combine functional policy and urgency (wait time only).  I have got 
>>>the  scheduler config (with slot urgency set to 0) detailed below.  When I do a 
>>>"qstat -prio", all pending jobs report 0 as "nurg" and "ntckts", whatever their 
>>>waiting time is.  Is that the expected behavior?
>>>
>>>Rgds
>>>
>>>Jean-paul
>>>--------------------
>>>Output of qconf -ssconf:
>>>
>>>algorithm                         default
>>>...
>>>maxujobs                          8
>>>queue_sort_method                 load
>>>job_load_adjustments              np_load_avg=0.50
>>>load_adjustment_decay_time        0:7:30
>>>load_formula                      slots
>>>schedd_job_info                   true
>>>flush_submit_sec                  0
>>>flush_finish_sec                  0
>>>params                            profile=1
>>>reprioritize_interval             0:0:0
>>>halftime                          336
>>>usage_weight_list                 cpu=0.848000,mem=0.152000,io=0.000000
>>>compensation_factor               5.000000
>>>weight_user                       1.00000
>>>weight_project                    0.000000
>>>weight_department                 0.000000
>>>weight_job                        0.000000
>>>weight_tickets_functional         1000000
>>>weight_tickets_share              1000000
>>>share_override_tickets            TRUE
>>>share_functional_shares           TRUE
>>>max_functional_jobs_to_schedule   200
>>>report_pjob_tickets               TRUE
>>>max_pending_tasks_per_job         50
>>>halflife_decay_list               none
>>>policy_hierarchy                  FS
>>>weight_ticket                     1.000000
>>>weight_waiting_time               0.010000
>>>weight_deadline                   3600000.000000
>>>weight_urgency                    0.010000
>>>weight_priority                   0.000000
>>>max_reservation                   0
>>>default_duration                  0:10:0
>>>
>>>
>>>
>>>Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
>>>
>>>
>>>
>>>      
>>>
>>>>Hi Jean-Paul,
>>>>
>>>>I just did the test with the env you describe. I am sure, that you found 
>>>>a bug. In my tests, the
>>>>targeted resource share is allways 0 as you describe it. However, the 
>>>>actual resource share
>>>>is reported correctly.
>>>>
>>>>Cheers,
>>>>Stephan
>>>>
>>>>Jean-Paul Minet wrote:
>>>>
>>>>  
>>>>
>>>>
>>>>        
>>>>
>>>>>Hi,
>>>>>
>>>>>Our bi-proc cluster is used for sequential, OpenMP and MPI jobs.  We 
>>>>>wish to:
>>>>>
>>>>>1) use fair-share scheduling with equal shares for all users
>>>>>
>>>>>I have disabled Priority and Urgency scheduling, and set policy 
>>>>>hierarchy to S.:
>>>>>
>>>>>lemaitre ~ # qconf -ssconf
>>>>>algorithm                         default
>>>>>...
>>>>>halftime                          336
>>>>>usage_weight_list                 cpu=0.848000,mem=0.152000,io=0.000000
>>>>>...
>>>>>weight_tickets_functional         0
>>>>>weight_tickets_share              10000
>>>>>...
>>>>>policy_hierarchy                  S
>>>>>weight_ticket                     1.000000
>>>>>...
>>>>>weight_urgency                    0.000000
>>>>>weight_priority                   0.000000
>>>>>
>>>>>Under the share tree policy, I have only defined a default leaf under 
>>>>>which all users appear, but "Actual resource share" and "Targeted 
>>>>>resource share" remain 0 for all users, as if actual usage was not 
>>>>>taken into account?  This is confirmed by jobs being dispatched more 
>>>>>like in FIFO order than following past usage. What's wrong?
>>>>>    
>>>>>
>>>>>          
>>>>>
>>>>  
>>>>
>>>>
>>>>        
>>>>
>>>>>2) limit the total number of CPUs/slots used by any user at any time: 
>>>>>MaxJobs/User doesn't help as a single MPI job can use many slots and 
>>>>>therefore cannot compare to a sequential job.  How can we implement this?
>>>>>
>>>>>3) fill-up hosts with sequential jobs to leave as many empty nodes for 
>>>>>OpenMP and MPI jobs.  I have read Stephen G. WebL Log: am I correct in 
>>>>>assuming that I have to define a complex_values slots=2 for each of 
>>>>>the biproc host (we don't want more jobs than CPU) and, thereafter, 
>>>>>the scheduler will select the hosts with the least available slots 
>>>>>(setting of course queue_sort_method=load and load_formula=slots) ?
>>>>>
>>>>>Thanks for any help
>>>>>
>>>>>Jean-Paul
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>    
>>>>>
>>>>>          
>>>>>
>>>>  
>>>>
>>>>        
>>>>
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>>    
>>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list