[GE users] SGE PE+Scheduler problem

reuti reuti at staff.uni-marburg.de
Thu Aug 5 16:20:34 BST 2010


Hi,

Am 05.08.2010 um 00:09 schrieb lvta0909:

> I'm trying to figure out why the $pe_hostfile generated by SGE isnt
> taking the np_load_avg[as configured i the scheduler] in account when
> selecting the nodes for executing parallel jobs.
> The scheduler does work ok when I run "independent jobs", but as one
> can see in the snip bellow it doesnt care for the load when selecting
> a PE.
> 
> What am I missing?

nothing.

For parallel jobs the load is not taken into account, but the allocation_rule just followed:

"Well, having setup the scheduler this way, one might wounder how this setting works together with the parallel environment (pe) allocation rule. The default setting is, what ever is specified in the pe, overwrites the scheduler configuration. Only if  "pe_slots"is set as an allocation rule,  the scheduler configuration is used."

from: http://blogs.sun.com/sgrell/entry/grid_engine_scheduler_hacks_least

-- Reuti


> 
> tyvm
> 
> 
> ---snip---
> sge version: 62u2
> 
> tiagogomes at cluster-lps$ cat test.sh
> #!/bin/bash
> #$ -V -cwd
> echo "The Job ID of this job is $JOB_ID"
> echo "The pe host file follows:"
> cat $PE_HOSTFILE
> 
> tiagogomes at cluster-lps$ qstat -f
> queuename                      qtype resv/used/tot. load_avg arch
>   states
> ---------------------------------------------------------------------------------
> all.q at compute-0-0.local        BIP   0/0/4          2.03     lx26-amd64
> ---------------------------------------------------------------------------------
> all.q at compute-0-1.local        BIP   0/0/4          0.00     lx26-amd64
> ---------------------------------------------------------------------------------
> all.q at compute-0-2.local        BIP   0/0/4          0.00     lx26-amd64
> ---------------------------------------------------------------------------------
> all.q at compute-0-3.local        BIP   0/0/4          0.00     lx26-amd64
> ---------------------------------------------------------------------------------
> all.q at compute-0-4.local        BIP   0/0/4          0.00     lx26-amd64
> ---------------------------------------------------------------------------------
> all.q at compute-0-5.local        BIP   0/0/4          1.98     lx26-amd64
> ---------------------------------------------------------------------------------
> all.q at compute-0-6.local        BIP   0/1/4          0.00     lx26-amd64
> ---------------------------------------------------------------------------------
> all.q at compute-0-7.local        BIP   0/0/4          0.00     lx26-amd64
> ---------------------------------------------------------------------------------
> all.q at compute-0-8.local        BIP   0/0/4          0.00     lx26-amd64
> 
> 
> tiagogomes at cluster-lps$ qsub -pe orte 12 test.sh
> 
> tiagogomes at cluster-lps$ cat test.sh.o5516
> Warning: no access to tty (Bad file descriptor).
> Thus no job control in this shell.
> The Job ID of this job is 5516
> The pe host file follows:
> compute-0-0.local 4 all.q at compute-0-0.local <NULL>
> compute-0-5.local 4 all.q at compute-0-5.local <NULL>
> compute-0-4.local 4 all.q at compute-0-4.local <NULL>
> 
> 
> some configurations:
> tiagogomes at cluster-lps$ qconf -ssconf
> algorithm                         default
> schedule_interval                 0:0:15
> maxujobs                          0
> queue_sort_method                 load
> job_load_adjustments              np_load_avg=0.25
> load_adjustment_decay_time        00:05:00
> load_formula                      np_load_avg
> schedd_job_info                   true
> flush_submit_sec                  0
> flush_finish_sec                  0
> params                            none
> reprioritize_interval             0:0:0
> halftime                          168
> usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
> compensation_factor               5.000000
> weight_user                       0.250000
> weight_project                    0.250000
> weight_department                 0.250000
> weight_job                        0.250000
> weight_tickets_functional         0
> weight_tickets_share              0
> share_override_tickets            TRUE
> share_functional_shares           TRUE
> max_functional_jobs_to_schedule   200
> report_pjob_tickets               TRUE
> max_pending_tasks_per_job         50
> halflife_decay_list               none
> policy_hierarchy                  OFS
> weight_ticket                     0.010000
> weight_waiting_time               0.000000
> weight_deadline                   3600000.000000
> weight_urgency                    0.100000
> weight_priority                   1.000000
> max_reservation                   0
> default_duration                  INFINITY
> tiagogomes at cluster-lps$ qconf -sp orte
> pe_name            orte
> slots              9999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    /bin/true
> stop_proc_args     /bin/true
> allocation_rule    $fill_up
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary TRUE
> 
> -- end snip --
> 
> att,
> 
> --
> Tiago Bitarelli Gomes
> lvta0909 at gmail.com
> 
> 
> 
> -- 
> Tiago Bitarelli Gomes
> lvta0909 at gmail.com
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=272299
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=272455

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list