[GE users] SGE PE+Scheduler problem

lvta0909 lvta0909 at gmail.com
Wed Aug 4 23:09:59 BST 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,

I'm trying to figure out why the $pe_hostfile generated by SGE isnt
taking the np_load_avg[as configured i the scheduler] in account when
selecting the nodes for executing parallel jobs.
The scheduler does work ok when I run "independent jobs", but as one
can see in the snip bellow it doesnt care for the load when selecting
a PE.

What am I missing?

tyvm


---snip---
sge version: 62u2

tiagogomes at cluster-lps$ cat test.sh
#!/bin/bash
#$ -V -cwd
echo "The Job ID of this job is $JOB_ID"
echo "The pe host file follows:"
cat $PE_HOSTFILE

tiagogomes at cluster-lps$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch
  states
---------------------------------------------------------------------------------
all.q at compute-0-0.local        BIP   0/0/4          2.03     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-1.local        BIP   0/0/4          0.00     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-2.local        BIP   0/0/4          0.00     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-3.local        BIP   0/0/4          0.00     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-4.local        BIP   0/0/4          0.00     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-5.local        BIP   0/0/4          1.98     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-6.local        BIP   0/1/4          0.00     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-7.local        BIP   0/0/4          0.00     lx26-amd64
---------------------------------------------------------------------------------
all.q at compute-0-8.local        BIP   0/0/4          0.00     lx26-amd64


tiagogomes at cluster-lps$ qsub -pe orte 12 test.sh

tiagogomes at cluster-lps$ cat test.sh.o5516
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
The Job ID of this job is 5516
The pe host file follows:
compute-0-0.local 4 all.q at compute-0-0.local <NULL>
compute-0-5.local 4 all.q at compute-0-5.local <NULL>
compute-0-4.local 4 all.q at compute-0-4.local <NULL>


some configurations:
tiagogomes at cluster-lps$ qconf -ssconf
algorithm                         default
schedule_interval                 0:0:15
maxujobs                          0
queue_sort_method                 load
job_load_adjustments              np_load_avg=0.25
load_adjustment_decay_time        00:05:00
load_formula                      np_load_avg
schedd_job_info                   true
flush_submit_sec                  0
flush_finish_sec                  0
params                            none
reprioritize_interval             0:0:0
halftime                          168
usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor               5.000000
weight_user                       0.250000
weight_project                    0.250000
weight_department                 0.250000
weight_job                        0.250000
weight_tickets_functional         0
weight_tickets_share              0
share_override_tickets            TRUE
share_functional_shares           TRUE
max_functional_jobs_to_schedule   200
report_pjob_tickets               TRUE
max_pending_tasks_per_job         50
halflife_decay_list               none
policy_hierarchy                  OFS
weight_ticket                     0.010000
weight_waiting_time               0.000000
weight_deadline                   3600000.000000
weight_urgency                    0.100000
weight_priority                   1.000000
max_reservation                   0
default_duration                  INFINITY
tiagogomes at cluster-lps$ qconf -sp orte
pe_name            orte
slots              9999
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $fill_up
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary TRUE

-- end snip --

att,

--
Tiago Bitarelli Gomes
lvta0909 at gmail.com



-- 
Tiago Bitarelli Gomes
lvta0909 at gmail.com

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=272299

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list