[GE users] Fair share config, fill-up hosts and max user slots

Jean-Paul Minet minet at cism.ucl.ac.be
Wed Jan 11 09:51:27 GMT 2006

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


I am a bit puzzled... in order to overcome the apparent bug we are discussing 
(fair share usage not taken into account), I have defined a functionnal tree 
(all users with equal n° of shares).  Since then... it seems usage is accounted 
for in the fair share tree/policy (actual resource share and combined usage are 
calculated and displayed properly; targeted usage remains 0)!!!  I am not sure 
of the causality between the two, but while I repeatedly had O as value for 
stckt (with qstat -ext), this is no longer the case and I am not aware of having 
done anything else in the config which could impact on the faire share 
policy/usage (besides, I indeed also set enforce_project to false in the cluster 
config [it was set to true before], and set the default project to NONE for all 
users, whereas it was set to specific projects before)...

Now, there is still a problem with usage accounting, as we have an infiniband 
interconnect, and the tight integration doesn't work with MPI jobs.  I have 
looked in the How-To's and found a package for IBA tight integration, but the 
installed version of mvapich is earlier the one required to apply the patch on 
the mpirun_rsh.c.  So for those jobs, resources are not accounted for 
adequately... but this is a wider problem than SGE itself.  I am working on 
getting mvapich upgraded.

Finally, the question I had for the pending job remains valid: if some jobs are 
in the scheduler waiting area (not being dispatched because max_u_job is 
reached, or because resources are not available), shouldn't the scheduler also 
display tickets/urgency and priority information for those jobs?

Thnks again for your help


Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
> Hi Jean-Paul,
> we do have a bug with displaying the sharetree data. I could not find any
> issue with the actual sharetree computation. Those test were all successful.
> How long did you wait? Did you wait until the scheduler made it run? The
> priority
> information is only available, when the scheduler is running and after
> it has finished
> its first run.
> Could you give me your entire configuration, that is related to this
> problem?
> Meaning:
> - qstat -prio
> - qstat -ext
> - user configuration involved
> - project configuration involved
> - resource configuration qstat -sc
> - qconf -sss output
> - sharetree config.
> Sofar I can not replicate your issue. Did you build your binaries yourself?
> Which archs are you using?
> Cheers,
> Stephan
> Jean-Paul Minet wrote On 01/06/06 12:00,:
>>Trying to work around the possible fair share bug (is it confirmed?), I am 
>>trying to combine functional policy and urgency (wait time only).  I have got 
>>the  scheduler config (with slot urgency set to 0) detailed below.  When I do a 
>>"qstat -prio", all pending jobs report 0 as "nurg" and "ntckts", whatever their 
>>waiting time is.  Is that the expected behavior?
>>Output of qconf -ssconf:
>>algorithm                         default
>>maxujobs                          8
>>queue_sort_method                 load
>>job_load_adjustments              np_load_avg=0.50
>>load_adjustment_decay_time        0:7:30
>>load_formula                      slots
>>schedd_job_info                   true
>>flush_submit_sec                  0
>>flush_finish_sec                  0
>>params                            profile=1
>>reprioritize_interval             0:0:0
>>halftime                          336
>>usage_weight_list                 cpu=0.848000,mem=0.152000,io=0.000000
>>compensation_factor               5.000000
>>weight_user                       1.00000
>>weight_project                    0.000000
>>weight_department                 0.000000
>>weight_job                        0.000000
>>weight_tickets_functional         1000000
>>weight_tickets_share              1000000
>>share_override_tickets            TRUE
>>share_functional_shares           TRUE
>>max_functional_jobs_to_schedule   200
>>report_pjob_tickets               TRUE
>>max_pending_tasks_per_job         50
>>halflife_decay_list               none
>>policy_hierarchy                  FS
>>weight_ticket                     1.000000
>>weight_waiting_time               0.010000
>>weight_deadline                   3600000.000000
>>weight_urgency                    0.010000
>>weight_priority                   0.000000
>>max_reservation                   0
>>default_duration                  0:10:0
>>Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
>>>Hi Jean-Paul,
>>>I just did the test with the env you describe. I am sure, that you found 
>>>a bug. In my tests, the
>>>targeted resource share is allways 0 as you describe it. However, the 
>>>actual resource share
>>>is reported correctly.
>>>Jean-Paul Minet wrote:
>>>>Our bi-proc cluster is used for sequential, OpenMP and MPI jobs.  We 
>>>>wish to:
>>>>1) use fair-share scheduling with equal shares for all users
>>>>I have disabled Priority and Urgency scheduling, and set policy 
>>>>hierarchy to S.:
>>>>lemaitre ~ # qconf -ssconf
>>>>algorithm                         default
>>>>halftime                          336
>>>>usage_weight_list                 cpu=0.848000,mem=0.152000,io=0.000000
>>>>weight_tickets_functional         0
>>>>weight_tickets_share              10000
>>>>policy_hierarchy                  S
>>>>weight_ticket                     1.000000
>>>>weight_urgency                    0.000000
>>>>weight_priority                   0.000000
>>>>Under the share tree policy, I have only defined a default leaf under 
>>>>which all users appear, but "Actual resource share" and "Targeted 
>>>>resource share" remain 0 for all users, as if actual usage was not 
>>>>taken into account?  This is confirmed by jobs being dispatched more 
>>>>like in FIFO order than following past usage. What's wrong?
>>>>2) limit the total number of CPUs/slots used by any user at any time: 
>>>>MaxJobs/User doesn't help as a single MPI job can use many slots and 
>>>>therefore cannot compare to a sequential job.  How can we implement this?
>>>>3) fill-up hosts with sequential jobs to leave as many empty nodes for 
>>>>OpenMP and MPI jobs.  I have read Stephen G. WebL Log: am I correct in 
>>>>assuming that I have to define a complex_values slots=2 for each of 
>>>>the biproc host (we don't want more jobs than CPU) and, thereafter, 
>>>>the scheduler will select the hosts with the least available slots 
>>>>(setting of course queue_sort_method=load and load_formula=slots) ?
>>>>Thanks for any help
>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

Jean-Paul Minet
Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de Masse
Université Catholique de Louvain
Tel: (32) (0) - Fax: (32) (0)

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list