[GE users] Minimum Number of Nodes/Job
murphygb
brian.murphy at siemens.com
Fri Jul 9 18:18:36 BST 2010
> > > Am 09.07.2010 um 16:21 schrieb murphygb:
> > >
> > > >> Am 09.07.2010 um 01:31 schrieb murphygb:
> > > >>
> > > >>> What config do I use to ensure SGE spreads my distributed jobs across the fewest number of nodes, i.e., node selection starts with the node with the most available slots and then works its way down? Right now SGE is randomly choosing loaded machines over unloaded machines for some reason (6.2u5) I have tried a scheduler conf like:
> > > >>>
> > > >>> algorithm default
> > > >>> schedule_interval 0:0:15
> > > >>> maxujobs 0
> > > >>> queue_sort_method load
> > > >>> job_load_adjustments np_load_avg=0.20
> > > >>
> > > >> As np_load_avg isn't used for the load_formula, the adjustments can be set to NONE here.
> > > >>
> > > >>> load_adjustment_decay_time 0:5:00
> > > >>> load_formula -slots
> > > >>
> > > >> Maybe "slots" will work to have a fill-up policy.
> > > >>
> > > >> -- Reuti
> > > >>
> > > > For load_formula, I have tried +slots, -slots, +num_proc, -num_proc. Nothing affects queue selection. The job goes to the same machines in the same order every time. Allocation rule for the pe for these jobs is set to $fill_up. Even when I have load formula set to np_load_avg, machines with load are being selected over machines with no load.
> > >
> > > Correct, for parallel jobs this doesn't apply:
> > >
> > > http://blogs.sun.com/sgrell/entry/grid_engine_scheduler_hacks_least
> > >
> > > -- Reuti
> > >
> > Yes, I have tried this configuration and I am not getting the expected behavior. At submit time I have 2 machines available. One machine already has 7 of 8 slots in use and has a load of 7.05. The other machine has 0 of 8 slots in use and has a load of 0.00. Using the sconf settings for "use least used host first" my sconf looks like this:
> >
> > algorithm default
> > schedule_interval 0:0:15
> > maxujobs 0
> > queue_sort_method load
> > job_load_adjustments NONE
> > load_adjustment_decay_time 0:0:00
> > load_formula -slots
> > schedd_job_info true
> > flush_submit_sec 1
> > flush_finish_sec 1
> > params NONE
> > reprioritize_interval 0:0:0
> > halftime 168
> > usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000
> > compensation_factor 5.000000
> > weight_user 0.100000
> > weight_project 0.700000
> > weight_department 0.100000
> > weight_job 0.100000
> > weight_tickets_functional 100000
> > weight_tickets_share 0
> > share_override_tickets TRUE
> > share_functional_shares TRUE
> > max_functional_jobs_to_schedule 200
> > report_pjob_tickets TRUE
> > max_pending_tasks_per_job 50
> > halflife_decay_list none
> > policy_hierarchy OF
> > weight_ticket 10.000000
> > weight_waiting_time 0.000000
> > weight_deadline 3600000.000000
> > weight_urgency 0.100000
> > weight_priority 1.000000
> > max_reservation 1
> > default_duration INFINITY
> >
> > I submitted a job requesting 8 processors. I would expect the entire job to go on the 8 processor machine that has no load. This is not the case. SGE used one processor from the loaded machine first and then used 7 processors from the completely unloaded machine.
> > >
> > > >>
> > > >>> schedd_job_info true
> > > >>> flush_submit_sec 1
> > > >>> flush_finish_sec 1
> > > >>> params PE_RANGE_ALG=highest,MONITOR=1
> > > >>> reprioritize_interval 0:0:0
> > > >>> halftime 168
> > > >>> usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000
> > > >>> compensation_factor 5.000000
> > > >>> weight_user 0.100000
> > > >>> weight_project 0.700000
> > > >>> weight_department 0.100000
> > > >>> weight_job 0.100000
> > > >>> weight_tickets_functional 100000
> > > >>> weight_tickets_share 0
> > > >>> share_override_tickets TRUE
> > > >>> share_functional_shares TRUE
> > > >>> max_functional_jobs_to_schedule 200
> > > >>> report_pjob_tickets TRUE
> > > >>> max_pending_tasks_per_job 50
> > > >>> halflife_decay_list none
> > > >>> policy_hierarchy OF
> > > >>> weight_ticket 10.000000
> > > >>> weight_waiting_time 0.000000
> > > >>> weight_deadline 3600000.000000
> > > >>> weight_urgency 0.100000
> > > >>> weight_priority 1.000000
> > > >>> max_reservation 1
> > > >>> default_duration INFINITY
> > > >>>
> > > >>> ------------------------------------------------------
> > > >>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=266805
> > > >>>
> > > >>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> > > >
> > > >
> Looks like I got it to work. Missed the part about allocation rule having to be pe_slots. Thanks.
>
The problem is with the allocation rule having to be pe_slots to get it to work means that the jobs must fit all on one node which is useless to me. There must be another way ..........
------------------------------------------------------
> > > > http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=266902
> > > >
> > > > To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=266942
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users
mailing list