[GE users] Minimum Number of Nodes/Job

murphygb brian.murphy at siemens.com
Fri Jul 9 18:18:36 BST 2010


> > > Am 09.07.2010 um 16:21 schrieb murphygb:
> > > 
> > > >> Am 09.07.2010 um 01:31 schrieb murphygb:
> > > >> 
> > > >>> What config do I use to ensure SGE spreads my distributed jobs across the fewest number of nodes, i.e., node selection starts with the node with the most available slots and then works its way down?  Right now SGE is randomly choosing loaded machines over unloaded machines for some reason (6.2u5) I have tried a scheduler conf like:
> > > >>> 
> > > >>> algorithm                         default
> > > >>> schedule_interval                 0:0:15
> > > >>> maxujobs                          0
> > > >>> queue_sort_method                 load
> > > >>> job_load_adjustments              np_load_avg=0.20
> > > >> 
> > > >> As np_load_avg isn't used for the load_formula, the adjustments can be set to NONE here.
> > > >> 
> > > >>> load_adjustment_decay_time        0:5:00
> > > >>> load_formula                      -slots
> > > >> 
> > > >> Maybe "slots" will work to have a fill-up policy.
> > > >> 
> > > >> -- Reuti
> > > >> 
> > > > For load_formula, I have tried +slots, -slots, +num_proc, -num_proc.  Nothing affects queue selection.  The job goes to the same machines in the same order every time.  Allocation rule for the pe for these jobs is set to $fill_up.  Even when I have load formula set to np_load_avg, machines with load are being selected over machines with no load.
> > > 
> > > Correct, for parallel jobs this doesn't apply:
> > > 
> > > http://blogs.sun.com/sgrell/entry/grid_engine_scheduler_hacks_least
> > > 
> > > -- Reuti
> > > 
> > Yes, I have tried this configuration and I am not getting the expected behavior.  At submit time I have 2 machines available.  One machine already has 7 of 8 slots in use and has a load of 7.05.  The other machine has 0 of 8 slots in use and has a load of 0.00.  Using the sconf settings for "use least used host first" my sconf looks like this:
> > 
> > algorithm                         default
> > schedule_interval                 0:0:15
> > maxujobs                          0
> > queue_sort_method                 load
> > job_load_adjustments              NONE
> > load_adjustment_decay_time        0:0:00
> > load_formula                      -slots
> > schedd_job_info                   true
> > flush_submit_sec                  1
> > flush_finish_sec                  1
> > params                            NONE
> > reprioritize_interval             0:0:0
> > halftime                          168
> > usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
> > compensation_factor               5.000000
> > weight_user                       0.100000
> > weight_project                    0.700000
> > weight_department                 0.100000
> > weight_job                        0.100000
> > weight_tickets_functional         100000
> > weight_tickets_share              0
> > share_override_tickets            TRUE
> > share_functional_shares           TRUE
> > max_functional_jobs_to_schedule   200
> > report_pjob_tickets               TRUE
> > max_pending_tasks_per_job         50
> > halflife_decay_list               none
> > policy_hierarchy                  OF
> > weight_ticket                     10.000000
> > weight_waiting_time               0.000000
> > weight_deadline                   3600000.000000
> > weight_urgency                    0.100000
> > weight_priority                   1.000000
> > max_reservation                   1
> > default_duration                  INFINITY
> > 
> > I submitted a job requesting 8 processors.  I would expect the entire job to go on the 8 processor machine that has no load.  This is not the case.  SGE used one processor from the loaded machine first and then used 7 processors from the completely unloaded machine.
> > > 
> > > >> 
> > > >>> schedd_job_info                   true
> > > >>> flush_submit_sec                  1
> > > >>> flush_finish_sec                  1
> > > >>> params                            PE_RANGE_ALG=highest,MONITOR=1
> > > >>> reprioritize_interval             0:0:0
> > > >>> halftime                          168
> > > >>> usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
> > > >>> compensation_factor               5.000000
> > > >>> weight_user                       0.100000
> > > >>> weight_project                    0.700000
> > > >>> weight_department                 0.100000
> > > >>> weight_job                        0.100000
> > > >>> weight_tickets_functional         100000
> > > >>> weight_tickets_share              0
> > > >>> share_override_tickets            TRUE
> > > >>> share_functional_shares           TRUE
> > > >>> max_functional_jobs_to_schedule   200
> > > >>> report_pjob_tickets               TRUE
> > > >>> max_pending_tasks_per_job         50
> > > >>> halflife_decay_list               none
> > > >>> policy_hierarchy                  OF
> > > >>> weight_ticket                     10.000000
> > > >>> weight_waiting_time               0.000000
> > > >>> weight_deadline                   3600000.000000
> > > >>> weight_urgency                    0.100000
> > > >>> weight_priority                   1.000000
> > > >>> max_reservation                   1
> > > >>> default_duration                  INFINITY
> > > >>> 
> > > >>> ------------------------------------------------------
> > > >>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=266805
> > > >>> 
> > > >>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> > > > 
> > > > 
> Looks like I got it to work.  Missed the part about allocation rule having to be pe_slots.  Thanks.
> 
The problem is with the allocation rule having to be pe_slots to get it to work means that the jobs must fit all on one node which is useless to me.  There must be another way ..........

------------------------------------------------------
> > > > http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=266902
> > > > 
> > > > To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=266942

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list