[GE users] 6.2u4: resource reservation not working for some jobs

reuti reuti at staff.uni-marburg.de
Mon Jan 11 14:45:45 GMT 2010


Am 11.01.2010 um 14:09 schrieb ccaamad:

> Hi,
>
> I'm seeing a strange problem with resource reservation and I  
> wondered if
> anyone else can offer any advice.
>
> It's the usual scenario: a user is submitting parallel jobs and we  
> want to
> avoid them being starved of resources by smaller, lower priority jobs.
>
> * I've got a consumable resource on h_vmem, default 1G.
> * I'm enforcing h_rt to be explicitly requested.
> * I've got a custom complex "cputype".
> * I've got a custom complex "island".
> * Queues have a maximum h_rt set of 48:00:00.
>
> Weirdly, resources are not being reserved if I do:
>
> $ qsub -clear -cwd -R y -l h_vmem=1G,h_rt=48:00:00,cputype=intel - 
> pe ib 512 wait.sh
>
> Whereas they ARE for the following very similar requests:
>
> $ qsub -clear -cwd -R y -l h_vmem=1024M,h_rt=48:00:00,cputype=intel  
> -pe ib 512 wait.sh
> $ qsub -clear -cwd -R y -l h_vmem=1G,h_rt=48:00:00,cputype=intel\* - 
> pe ib 512 wait.sh
> $ qsub -clear -cwd -R y -l h_vmem=1G,h_rt=47:59:59,cputype=intel - 
> pe ib 512 wait.sh
> $ qsub -clear -cwd -R y -l h_vmem=1G,h_rt=48:00:00 -pe ib 512 wait.sh
> $ qsub -clear -cwd -R y -l h_rt=48:00:00,cputype=intel -pe ib 512  
> wait.sh
>
> I got the same results if I upped the queue h_rt to 48:00:01.
>
> All up systems have 8 slots and the following complexes:
>
>    h_vmem=12G
>    exclusive=true
>    cputype=intel
>    island= (varies depending on host)
>
> I'm running with the courtesy ge62u4_lx24-amd64 binaries.
>
> Any ideas on what is going on, please?

Did you set up any RQS?

-- Reuti


> Thanks,
>
> Mark
> -- 
> -----------------------------------------------------------------
> Mark Dixon                       Email    : m.c.dixon at leeds.ac.uk
> HPC/Grid Systems Support         Tel (int): 35429
> Information Systems Services     Tel (ext): +44(0)113 343 5429
> University of Leeds, LS2 9JT, UK
> -----------------------------------------------------------------
>
> $ qstat -g c
> CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL  
> aoACDS  cdsuE
> ---------------------------------------------------------------------- 
> ----------
> c1s0.q                            0.89    134      0     26     
> 192      0     48
> c1s1.q                            0.00     14      0     32      
> 96      0     64
> c2s0.q                            0.37     64      0    128     
> 192      0      0
> c3s0.q                            0.75    128      0     64     
> 192      0      0
> smp.q                             -NA-      0      0      0      
> 64      0     64
> $ qconf -ssconf
> algorithm                         default
> schedule_interval                 0:0:1
> maxujobs                          0
> queue_sort_method                 seqno
> job_load_adjustments              NONE
> load_adjustment_decay_time        0:0:0
> load_formula                      np_load_avg
> schedd_job_info                   true
> flush_submit_sec                  0
> flush_finish_sec                  0
> params                            MONITOR=true
> reprioritize_interval             0:0:0
> halftime                          168
> usage_weight_list                  
> cpu=1.000000,mem=0.000000,io=0.000000
> compensation_factor               5.000000
> weight_user                       0.250000
> weight_project                    0.250000
> weight_department                 0.250000
> weight_job                        0.250000
> weight_tickets_functional         0
> weight_tickets_share              10000
> share_override_tickets            TRUE
> share_functional_shares           TRUE
> max_functional_jobs_to_schedule   200
> report_pjob_tickets               TRUE
> max_pending_tasks_per_job         50
> halflife_decay_list               none
> policy_hierarchy                  OFS
> weight_ticket                     0.010000
> weight_waiting_time               0.000000
> weight_deadline                   3600000.000000
> weight_urgency                    0.100000
> weight_priority                   1.000000
> max_reservation                   32
> default_duration                  INFINITY
>
> $ qconf -sc
> #name               shortcut   type        relop   requestable  
> consumable default  urgency
> #--------------------------------------------------------------------- 
> ---------------------
> arch                a          RESTRING    ==      YES          
> NO         NONE     0
> calendar            c          RESTRING    ==      YES          
> NO         NONE     0
> cpu                 cpu        DOUBLE      >=      YES          
> NO         0        0
> cputype             cputype    RESTRING    ==      YES          
> NO         NONE     0
> display_win_gui     dwg        BOOL        ==      YES          
> NO         0        0
> exclusive           excl       BOOL        EXCL    YES          
> YES        0        1000
> h_core              h_core     MEMORY      <=      YES          
> NO         0        0
> h_cpu               h_cpu      TIME        <=      YES          
> NO         0:0:0    0
> h_data              h_data     MEMORY      <=      YES          
> NO         0        0
> h_fsize             h_fsize    MEMORY      <=      YES          
> NO         0        0
> h_rss               h_rss      MEMORY      <=      YES          
> NO         0        0
> h_rt                h_rt       TIME        <=      FORCED       
> NO         0:0:0    0
> h_stack             h_stack    MEMORY      <=      YES          
> NO         0        0
> h_vmem              h_vmem     MEMORY      <=      YES          
> YES        1G       0
> hostname            h          HOST        ==      YES          
> NO         NONE     0
> island              island     RESTRING    ==      YES          
> NO         NONE     0
> load_avg            la         DOUBLE      >=      NO           
> NO         0        0
> load_long           ll         DOUBLE      >=      NO           
> NO         0        0
> load_medium         lm         DOUBLE      >=      NO           
> NO         0        0
> load_short          ls         DOUBLE      >=      NO           
> NO         0        0
> mem_free            mf         MEMORY      <=      YES          
> NO         0        0
> mem_total           mt         MEMORY      <=      YES          
> NO         0        0
> mem_used            mu         MEMORY      >=      YES          
> NO         0        0
> min_cpu_interval    mci        TIME        <=      NO           
> NO         0:0:0    0
> np_load_avg         nla        DOUBLE      >=      NO           
> NO         0        0
> np_load_long        nll        DOUBLE      >=      NO           
> NO         0        0
> np_load_medium      nlm        DOUBLE      >=      NO           
> NO         0        0
> np_load_short       nls        DOUBLE      >=      NO           
> NO         0        0
> num_proc            p          INT         ==      YES          
> NO         0        0
> qname               q          RESTRING    ==      YES          
> NO         NONE     0
> rerun               re         BOOL        ==      NO           
> NO         0        0
> s_core              s_core     MEMORY      <=      YES          
> NO         0        0
> s_cpu               s_cpu      TIME        <=      YES          
> NO         0:0:0    0
> s_data              s_data     MEMORY      <=      YES          
> NO         0        0
> s_fsize             s_fsize    MEMORY      <=      YES          
> NO         0        0
> s_rss               s_rss      MEMORY      <=      YES          
> NO         0        0
> s_rt                s_rt       TIME        <=      YES          
> NO         0:0:0    0
> s_stack             s_stack    MEMORY      <=      YES          
> NO         0        0
> s_vmem              s_vmem     MEMORY      <=      YES          
> NO         0        0
> seq_no              seq        INT         ==      NO           
> NO         0        0
> slots               s          INT         <=      YES          
> YES        1        1000
> swap_free           sf         MEMORY      <=      YES          
> NO         0        0
> swap_rate           sr         MEMORY      >=      YES          
> NO         0        0
> swap_rsvd           srsv       MEMORY      >=      YES          
> NO         0        0
> swap_total          st         MEMORY      <=      YES          
> NO         0        0
> swap_used           su         MEMORY      >=      YES          
> NO         0        0
> tmpdir              tmp        RESTRING    ==      NO           
> NO         NONE     0
> virtual_free        vf         MEMORY      <=      YES          
> NO         0        0
> virtual_total       vt         MEMORY      <=      YES          
> NO         0        0
> virtual_used        vu         MEMORY      >=      YES          
> NO         0        0
> # >#< starts a comment but comments are not saved across edits  
> --------
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=238087
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=238108

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list