[GE users] 6.2u4: resource reservation not working for some jobs

ccaamad m.c.dixon at leeds.ac.uk
Mon Jan 11 13:09:53 GMT 2010


Hi,

I'm seeing a strange problem with resource reservation and I wondered if 
anyone else can offer any advice.

It's the usual scenario: a user is submitting parallel jobs and we want to 
avoid them being starved of resources by smaller, lower priority jobs.

* I've got a consumable resource on h_vmem, default 1G.
* I'm enforcing h_rt to be explicitly requested.
* I've got a custom complex "cputype".
* I've got a custom complex "island".
* Queues have a maximum h_rt set of 48:00:00.

Weirdly, resources are not being reserved if I do:

$ qsub -clear -cwd -R y -l h_vmem=1G,h_rt=48:00:00,cputype=intel -pe ib 512 wait.sh

Whereas they ARE for the following very similar requests:

$ qsub -clear -cwd -R y -l h_vmem=1024M,h_rt=48:00:00,cputype=intel -pe ib 512 wait.sh
$ qsub -clear -cwd -R y -l h_vmem=1G,h_rt=48:00:00,cputype=intel\* -pe ib 512 wait.sh
$ qsub -clear -cwd -R y -l h_vmem=1G,h_rt=47:59:59,cputype=intel -pe ib 512 wait.sh
$ qsub -clear -cwd -R y -l h_vmem=1G,h_rt=48:00:00 -pe ib 512 wait.sh
$ qsub -clear -cwd -R y -l h_rt=48:00:00,cputype=intel -pe ib 512 wait.sh

I got the same results if I upped the queue h_rt to 48:00:01.

All up systems have 8 slots and the following complexes:

   h_vmem=12G
   exclusive=true
   cputype=intel
   island= (varies depending on host)

I'm running with the courtesy ge62u4_lx24-amd64 binaries.

Any ideas on what is going on, please?

Thanks,

Mark
-- 
-----------------------------------------------------------------
Mark Dixon                       Email    : m.c.dixon at leeds.ac.uk
HPC/Grid Systems Support         Tel (int): 35429
Information Systems Services     Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------

$ qstat -g c
CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE
--------------------------------------------------------------------------------
c1s0.q                            0.89    134      0     26    192      0     48
c1s1.q                            0.00     14      0     32     96      0     64
c2s0.q                            0.37     64      0    128    192      0      0
c3s0.q                            0.75    128      0     64    192      0      0
smp.q                             -NA-      0      0      0     64      0     64
$ qconf -ssconf
algorithm                         default
schedule_interval                 0:0:1
maxujobs                          0
queue_sort_method                 seqno
job_load_adjustments              NONE
load_adjustment_decay_time        0:0:0
load_formula                      np_load_avg
schedd_job_info                   true
flush_submit_sec                  0
flush_finish_sec                  0
params                            MONITOR=true
reprioritize_interval             0:0:0
halftime                          168
usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor               5.000000
weight_user                       0.250000
weight_project                    0.250000
weight_department                 0.250000
weight_job                        0.250000
weight_tickets_functional         0
weight_tickets_share              10000
share_override_tickets            TRUE
share_functional_shares           TRUE
max_functional_jobs_to_schedule   200
report_pjob_tickets               TRUE
max_pending_tasks_per_job         50
halflife_decay_list               none
policy_hierarchy                  OFS
weight_ticket                     0.010000
weight_waiting_time               0.000000
weight_deadline                   3600000.000000
weight_urgency                    0.100000
weight_priority                   1.000000
max_reservation                   32
default_duration                  INFINITY

$ qconf -sc
#name               shortcut   type        relop   requestable consumable default  urgency
#------------------------------------------------------------------------------------------
arch                a          RESTRING    ==      YES         NO         NONE     0
calendar            c          RESTRING    ==      YES         NO         NONE     0
cpu                 cpu        DOUBLE      >=      YES         NO         0        0
cputype             cputype    RESTRING    ==      YES         NO         NONE     0
display_win_gui     dwg        BOOL        ==      YES         NO         0        0
exclusive           excl       BOOL        EXCL    YES         YES        0        1000
h_core              h_core     MEMORY      <=      YES         NO         0        0
h_cpu               h_cpu      TIME        <=      YES         NO         0:0:0    0
h_data              h_data     MEMORY      <=      YES         NO         0        0
h_fsize             h_fsize    MEMORY      <=      YES         NO         0        0
h_rss               h_rss      MEMORY      <=      YES         NO         0        0
h_rt                h_rt       TIME        <=      FORCED      NO         0:0:0    0
h_stack             h_stack    MEMORY      <=      YES         NO         0        0
h_vmem              h_vmem     MEMORY      <=      YES         YES        1G       0
hostname            h          HOST        ==      YES         NO         NONE     0
island              island     RESTRING    ==      YES         NO         NONE     0
load_avg            la         DOUBLE      >=      NO          NO         0        0
load_long           ll         DOUBLE      >=      NO          NO         0        0
load_medium         lm         DOUBLE      >=      NO          NO         0        0
load_short          ls         DOUBLE      >=      NO          NO         0        0
mem_free            mf         MEMORY      <=      YES         NO         0        0
mem_total           mt         MEMORY      <=      YES         NO         0        0
mem_used            mu         MEMORY      >=      YES         NO         0        0
min_cpu_interval    mci        TIME        <=      NO          NO         0:0:0    0
np_load_avg         nla        DOUBLE      >=      NO          NO         0        0
np_load_long        nll        DOUBLE      >=      NO          NO         0        0
np_load_medium      nlm        DOUBLE      >=      NO          NO         0        0
np_load_short       nls        DOUBLE      >=      NO          NO         0        0
num_proc            p          INT         ==      YES         NO         0        0
qname               q          RESTRING    ==      YES         NO         NONE     0
rerun               re         BOOL        ==      NO          NO         0        0
s_core              s_core     MEMORY      <=      YES         NO         0        0
s_cpu               s_cpu      TIME        <=      YES         NO         0:0:0    0
s_data              s_data     MEMORY      <=      YES         NO         0        0
s_fsize             s_fsize    MEMORY      <=      YES         NO         0        0
s_rss               s_rss      MEMORY      <=      YES         NO         0        0
s_rt                s_rt       TIME        <=      YES         NO         0:0:0    0
s_stack             s_stack    MEMORY      <=      YES         NO         0        0
s_vmem              s_vmem     MEMORY      <=      YES         NO         0        0
seq_no              seq        INT         ==      NO          NO         0        0
slots               s          INT         <=      YES         YES        1        1000
swap_free           sf         MEMORY      <=      YES         NO         0        0
swap_rate           sr         MEMORY      >=      YES         NO         0        0
swap_rsvd           srsv       MEMORY      >=      YES         NO         0        0
swap_total          st         MEMORY      <=      YES         NO         0        0
swap_used           su         MEMORY      >=      YES         NO         0        0
tmpdir              tmp        RESTRING    ==      NO          NO         NONE     0
virtual_free        vf         MEMORY      <=      YES         NO         0        0
virtual_total       vt         MEMORY      <=      YES         NO         0        0
virtual_used        vu         MEMORY      >=      YES         NO         0        0
# >#< starts a comment but comments are not saved across edits --------

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=238087

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list