[GE users] Scheduler tuning

Robert Healey healer at rpi.edu
Mon Dec 1 16:20:06 GMT 2008


I'm using fill_up and my end users are still complaining about spread 
out jobs across the various nodes.  Does anyone have any other 
suggestions on how to remedy this?

Margaret Doll wrote:
> I found that if I had my parallel environment set to "round-robin", I  
> got into the
> situation that you describe.
> 
> I switched the PE to "fill_up" and  the first compute nodes slots are  
> all used
> before requests are made to the next compute nodes
> 
> 
> On Nov 19, 2008, at 12:09 PM, Bob Healey wrote:
> 
>> Thank you everyone who responded over night.  I've taken the  
>> suggestions
>> from the three emails I saw today and will be trying them out, if it
>> doesn't change anything, I'll post again.  For the person who asked,  
>> the
>> only resource being requested is slots.  I still haven't gotten the  
>> end
>> users to put in h_rt limits yet so I can get backfilling working.  But
>> that's a people problem, not a tech issue.
>>
>> Bob Healey
>> Systems Administrator
>> Molecularium Project
>> Department of Physics, Applied Physics, and Astronomy
>> healer at rpi.edu
>>
>> ==============Original message text===============
>> On Wed, 19 Nov 2008 5:11:41 EST andreas wrote:
>>
>> Hi Robert,
>>
>> I don't overlook your setup, but load adjustment has a saying in
>> parallel scheduler allocation. Try to replace
>>
>>> job_load_adjustments              np_load_avg=0.50
>>> load_adjustment_decay_time        0:7:30
>> with
>>
>>> job_load_adjustments              NONE
>>> load_adjustment_decay_time        0:0:0
>> it is possible that scheduling works then as you expect.
>>
>> Regards,
>> Andreas
>>
>> On Wed, 19 Nov 2008, Robert Healey wrote:
>>
>>> I'm currently using that flag, doesn't seem to help too much.  I also
>>> use slots as the scheduling criteria instead of load.
>>>
>>> Bob Healey
>>>
>>>
>>> qstat -t:
>>>  12645 0.51180 submit-run leyva        r     11/18/2008 16:02:12
>>> terra at compute-8-9.local        MASTER                        r
>>> 00:00:02 0.15357 0.00000
>>>
>>> terra at compute-8-9.local        SLAVE            1.compute-8-9 r
>>> 1:17:38:46 51351.78424 0.00000
>>>
>>> terra at compute-8-9.local        SLAVE
>>>
>>> terra at compute-8-9.local        SLAVE
>>>
>>> terra at compute-8-9.local        SLAVE
>>>  12645 0.51180 submit-run leyva        r     11/18/2008 16:02:12
>>> terra at compute-8-10.local       SLAVE
>>>
>>> terra at compute-8-10.local       SLAVE            1.compute-8-10 r
>>> 1:17:41:06 51175.77672 0.00000
>>>
>>> terra at compute-8-10.local       SLAVE
>>>
>>> terra at compute-8-10.local       SLAVE
>>>
>>> pe_name            openmpi
>>> slots              1310
>>> user_lists         NONE
>>> xuser_lists        NONE
>>> start_proc_args    /bin/true
>>> stop_proc_args     /bin/true
>>> allocation_rule    $fill_up
>>> control_slaves     TRUE
>>> job_is_first_task  FALSE
>>> urgency_slots      min
>>> accounting_summary TRUE
>>>
>>> qconf -msconf:
>>> [root at terra ~]# qconf -msconf
>>>
>>> algorithm                         default
>>> schedule_interval                 0:0:15
>>> maxujobs                          0
>>> queue_sort_method                 seqno
>>> job_load_adjustments              np_load_avg=0.50
>>> load_adjustment_decay_time        0:7:30
>>> load_formula                      slots
>>> schedd_job_info                   true
>>> flush_submit_sec                  0
>>> flush_finish_sec                  0
>>> params                            none
>>> reprioritize_interval             0:0:0
>>> halftime                          168
>>> usage_weight_list                  
>>> cpu=1.000000,mem=0.000000,io=0.000000
>>> compensation_factor               5.000000
>>> weight_user                       0.250000
>>> weight_project                    0.250000
>>> weight_department                 0.250000
>>> weight_job                        0.250000
>>> weight_tickets_functional         0
>>> weight_tickets_share              0
>>> share_override_tickets            TRUE
>>> share_functional_shares           TRUE
>>> max_functional_jobs_to_schedule   200
>>> report_pjob_tickets               TRUE
>>> max_pending_tasks_per_job         50
>>> halflife_decay_list               none
>>> policy_hierarchy                  OFS
>>> weight_ticket                     0.010000
>>> weight_waiting_time               0.000000
>>> weight_deadline                   3600000.000000
>>> weight_urgency                    0.100000
>>> weight_priority                   1.000000
>>> max_reservation                   1024
>>> default_duration                  96:00:00
>>>
>>> rayson wrote:
>>>> I think you can play with the "allocation_rule" in your PE setting,
>>>> esp. the "$fill_up" flag:
>>>>
>>>> http://gridengine.sunsource.net/nonav/source/browse/~checkout~/ 
>>>> gridengine/doc/htmlman/htmlman5/sge_pe.html>>
>>>> Rayson
>>>>
>>>>
>>>>
>>>> On 11/18/08, Robert Healey <healer at rpi.edu> wrote:
>>>>> Hello.
>>>>>
>>>>> I'm currently running Grid Engine across a 1032 processor/129 node
>>>>> cluster.  Most of my jobs submitted are parallel MPI jobs, with  
>>>>> 8-208
>>>>> slots requested/job.  I've been finding that even with 3-4 idle  
>>>>> nodes,
>>>>> an 8 slot job will be split among 2-3 nodes when the ideal in my
>>>>> circumstances is to run all 8 slots on a single 8 core node.  I've
>>>>> defined all the nodes as having 8 slots, and am looking for  
>>>>> things in
>>>>> the scheduler config to tweak to better schedule the CPU time.
>>>>>
>>>>> Thank you.
>>>>> --
>>>>> Bob Healey
>>>>> Systems Administrator
>>>>> Physics Department, RPI
>>>>> healer at rpi.edu
>>>>>
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88987 
>>>>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89037 
>>>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>>> -- 
>>> Bob Healey
>>> Systems Administrator
>>> Physics Department, RPI
>>> healer at rpi.edu
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89038 
>>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>> http://gridengine.info/
>> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551
>> Kirchheim-Heimstetten
>> Amtsgericht Muenchen: HRB 161028
>> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland  
>> Boemer
>> Vorsitzender des Aufsichtsrates: Martin Haering
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89049
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>>
>> ===========End of original message text===========
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89132
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
>> ].
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89279
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> 
> 

-- 
Bob Healey
Systems Administrator
Physics Department, RPI
healer at rpi.edu

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=90601

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list