[GE users] Scheduler tuning

Robert Healey healer at rpi.edu
Mon Dec 1 23:55:27 GMT 2008


Nope, here's a typical job submission:

#!/bin/bash

#$ -V
# Takes all the current environment variables and sets them for the 
current job

#$ -o /data/disk03/leyva/TESTB/SECHZEHN/10mM/Umb846/output -j y

#$ -pe openmpi 8

#$ -q terra
#$ -S /bin/bash
cd /data/disk03/leyva/TESTB/SECHZEHN/10mM/Umb846

/usr/mpi/gcc/openmpi-1.2.6/bin/mpirun -np 8 
/usr/local/gromacs-3.3.3/bin/mdrun_sm -np 8 -s 
input-poro45-ace-arg-nme-10mM-Umb846-run2.tpr -deffnm 
poro45-ace-arg-nme-10mM-Umb846-run2 -c 
poro45-ace-arg-nme-10mM-Umb846-run2.pdb  -dlb -dd 2 2 2  -pi 
Umbrellas846.ppa   -pd Umb846-pd-run2   -po Umb846-po-run2  -pn 
backbone.ndx -v

reuti wrote:
> Am 01.12.2008 um 17:20 schrieb Robert Healey:
> 
>> I'm using fill_up and my end users are still complaining about spread
>> out jobs across the various nodes.  Does anyone have any other
>> suggestions on how to remedy this?
> 
> Are there any resource requests, which can't be fulfilled when  
> running on a node because of limited memory or so?
> 
> -- Reuti
> 
> 
>> Margaret Doll wrote:
>>> I found that if I had my parallel environment set to "round-robin", I
>>> got into the
>>> situation that you describe.
>>>
>>> I switched the PE to "fill_up" and  the first compute nodes slots are
>>> all used
>>> before requests are made to the next compute nodes
>>>
>>>
>>> On Nov 19, 2008, at 12:09 PM, Bob Healey wrote:
>>>
>>>> Thank you everyone who responded over night.  I've taken the
>>>> suggestions
>>>> from the three emails I saw today and will be trying them out, if it
>>>> doesn't change anything, I'll post again.  For the person who asked,
>>>> the
>>>> only resource being requested is slots.  I still haven't gotten the
>>>> end
>>>> users to put in h_rt limits yet so I can get backfilling  
>>>> working.  But
>>>> that's a people problem, not a tech issue.
>>>>
>>>> Bob Healey
>>>> Systems Administrator
>>>> Molecularium Project
>>>> Department of Physics, Applied Physics, and Astronomy
>>>> healer at rpi.edu
>>>>
>>>> ==============Original message text===============
>>>> On Wed, 19 Nov 2008 5:11:41 EST andreas wrote:
>>>>
>>>> Hi Robert,
>>>>
>>>> I don't overlook your setup, but load adjustment has a saying in
>>>> parallel scheduler allocation. Try to replace
>>>>
>>>>> job_load_adjustments              np_load_avg=0.50
>>>>> load_adjustment_decay_time        0:7:30
>>>> with
>>>>
>>>>> job_load_adjustments              NONE
>>>>> load_adjustment_decay_time        0:0:0
>>>> it is possible that scheduling works then as you expect.
>>>>
>>>> Regards,
>>>> Andreas
>>>>
>>>> On Wed, 19 Nov 2008, Robert Healey wrote:
>>>>
>>>>> I'm currently using that flag, doesn't seem to help too much.  I  
>>>>> also
>>>>> use slots as the scheduling criteria instead of load.
>>>>>
>>>>> Bob Healey
>>>>>
>>>>>
>>>>> qstat -t:
>>>>>  12645 0.51180 submit-run leyva        r     11/18/2008 16:02:12
>>>>> terra at compute-8-9.local        MASTER                        r
>>>>> 00:00:02 0.15357 0.00000
>>>>>
>>>>> terra at compute-8-9.local        SLAVE            1.compute-8-9 r
>>>>> 1:17:38:46 51351.78424 0.00000
>>>>>
>>>>> terra at compute-8-9.local        SLAVE
>>>>>
>>>>> terra at compute-8-9.local        SLAVE
>>>>>
>>>>> terra at compute-8-9.local        SLAVE
>>>>>  12645 0.51180 submit-run leyva        r     11/18/2008 16:02:12
>>>>> terra at compute-8-10.local       SLAVE
>>>>>
>>>>> terra at compute-8-10.local       SLAVE            1.compute-8-10 r
>>>>> 1:17:41:06 51175.77672 0.00000
>>>>>
>>>>> terra at compute-8-10.local       SLAVE
>>>>>
>>>>> terra at compute-8-10.local       SLAVE
>>>>>
>>>>> pe_name            openmpi
>>>>> slots              1310
>>>>> user_lists         NONE
>>>>> xuser_lists        NONE
>>>>> start_proc_args    /bin/true
>>>>> stop_proc_args     /bin/true
>>>>> allocation_rule    $fill_up
>>>>> control_slaves     TRUE
>>>>> job_is_first_task  FALSE
>>>>> urgency_slots      min
>>>>> accounting_summary TRUE
>>>>>
>>>>> qconf -msconf:
>>>>> [root at terra ~]# qconf -msconf
>>>>>
>>>>> algorithm                         default
>>>>> schedule_interval                 0:0:15
>>>>> maxujobs                          0
>>>>> queue_sort_method                 seqno
>>>>> job_load_adjustments              np_load_avg=0.50
>>>>> load_adjustment_decay_time        0:7:30
>>>>> load_formula                      slots
>>>>> schedd_job_info                   true
>>>>> flush_submit_sec                  0
>>>>> flush_finish_sec                  0
>>>>> params                            none
>>>>> reprioritize_interval             0:0:0
>>>>> halftime                          168
>>>>> usage_weight_list
>>>>> cpu=1.000000,mem=0.000000,io=0.000000
>>>>> compensation_factor               5.000000
>>>>> weight_user                       0.250000
>>>>> weight_project                    0.250000
>>>>> weight_department                 0.250000
>>>>> weight_job                        0.250000
>>>>> weight_tickets_functional         0
>>>>> weight_tickets_share              0
>>>>> share_override_tickets            TRUE
>>>>> share_functional_shares           TRUE
>>>>> max_functional_jobs_to_schedule   200
>>>>> report_pjob_tickets               TRUE
>>>>> max_pending_tasks_per_job         50
>>>>> halflife_decay_list               none
>>>>> policy_hierarchy                  OFS
>>>>> weight_ticket                     0.010000
>>>>> weight_waiting_time               0.000000
>>>>> weight_deadline                   3600000.000000
>>>>> weight_urgency                    0.100000
>>>>> weight_priority                   1.000000
>>>>> max_reservation                   1024
>>>>> default_duration                  96:00:00
>>>>>
>>>>> rayson wrote:
>>>>>> I think you can play with the "allocation_rule" in your PE  
>>>>>> setting,
>>>>>> esp. the "$fill_up" flag:
>>>>>>
>>>>>> http://gridengine.sunsource.net/nonav/source/browse/~checkout~/
>>>>>> gridengine/doc/htmlman/htmlman5/sge_pe.html>>
>>>>>> Rayson
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11/18/08, Robert Healey <healer at rpi.edu> wrote:
>>>>>>> Hello.
>>>>>>>
>>>>>>> I'm currently running Grid Engine across a 1032 processor/129  
>>>>>>> node
>>>>>>> cluster.  Most of my jobs submitted are parallel MPI jobs, with
>>>>>>> 8-208
>>>>>>> slots requested/job.  I've been finding that even with 3-4 idle
>>>>>>> nodes,
>>>>>>> an 8 slot job will be split among 2-3 nodes when the ideal in my
>>>>>>> circumstances is to run all 8 slots on a single 8 core node.   
>>>>>>> I've
>>>>>>> defined all the nodes as having 8 slots, and am looking for
>>>>>>> things in
>>>>>>> the scheduler config to tweak to better schedule the CPU time.
>>>>>>>
>>>>>>> Thank you.
>>>>>>> --
>>>>>>> Bob Healey
>>>>>>> Systems Administrator
>>>>>>> Physics Department, RPI
>>>>>>> healer at rpi.edu
>>>>>>>
>>>>>>> ------------------------------------------------------
>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>>>>>> dsForumId=38&dsMessageId=88987
>>>>>>> To unsubscribe from this discussion, e-mail:
>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>>> ------------------------------------------------------
>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>>>>> dsForumId=38&dsMessageId=89037
>>>>>> To unsubscribe from this discussion, e-mail:
>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>> -- 
>>>>> Bob Healey
>>>>> Systems Administrator
>>>>> Physics Department, RPI
>>>>> healer at rpi.edu
>>>>>
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>>>> dsForumId=38&dsMessageId=89038
>>>>> To unsubscribe from this discussion, e-mail:
>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>> http://gridengine.info/
>>>> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551
>>>> Kirchheim-Heimstetten
>>>> Amtsgericht Muenchen: HRB 161028
>>>> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland
>>>> Boemer
>>>> Vorsitzender des Aufsichtsrates: Martin Haering
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>>> dsForumId=38&dsMessageId=89049
>>>> To unsubscribe from this discussion, e-mail:
>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>
>>>> ===========End of original message text===========
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>>> dsForumId=38&dsMessageId=89132
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users- 
>>>> unsubscribe at gridengine.sunsource.net
>>>> ].
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>> dsForumId=38&dsMessageId=89279
>>>
>>> To unsubscribe from this discussion, e-mail: [users- 
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>>
>> -- 
>> Bob Healey
>> Systems Administrator
>> Physics Department, RPI
>> healer at rpi.edu
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=90601
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=90638
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> 
> 

-- 
Bob Healey
Systems Administrator
Physics Department, RPI
healer at rpi.edu

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=90651

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list