Opened 4 years ago

#1556 new enhancement

Resource reservation can block jobs it does not need to

Reported by: markdixon Owned by:
Priority: normal Milestone:
Component: sge Version: 8.1.8
Severity: minor Keywords:
Cc:

Description

Resource reservation isn't as sophisticated as it could be and can end up reserving resources it doesn't need:

e.g.

  • a cluster running version 8.1.8 with 9 slots and a seqno queue_sort_method (see similar problems with the load method, but seqno is more predictable):
    [test1@login1.quack ~]$ qstat -f
    queuename                      qtype resv/used/tot. load_avg arch          states
    ---------------------------------------------------------------------------------
    polaris1.q@compute1.quack.leed BIP   0/0/4          0.00     lx-amd64      
    ---------------------------------------------------------------------------------
    polaris1.q@compute2.quack.leed BIP   0/0/5          0.10     lx-amd64      
    
  • From an empty cluster, submitting two 6 slot jobs and a 3 slot job results in maximal utilisation (job 2114651 leaps ahead of 2114650):
    [test1@login1.quack ~]$ qsub -l h_rt=1:0:0 -R y -pe ib 6 sleep_time.sh 3600
    Your job 2114649 ("sleep_time.sh") has been submitted
    
    [test1@login1.quack ~]$ qsub -l h_rt=1:0:0 -R y -pe ib 6 sleep_time.sh 3600
    Your job 2114650 ("sleep_time.sh") has been submitted
    
    [test1@login1.quack ~]$ qsub -l h_rt=1:0:0 -R y -pe ib 3 sleep_time.sh 3600
    Your job 2114651 ("sleep_time.sh") has been submitted
    
    [test1@login1.quack ~]$ qstat
    job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
    -----------------------------------------------------------------------------------------------------------------
    2114649 0.50050 sleep_time test1        r     09/04/2015 17:01:00 polaris1.q@compute1.quack.leed     6        
    2114651 0.50050 sleep_time test1        r     09/04/2015 17:01:04 polaris1.q@compute2.quack.leed     3        
    2114650 0.50050 sleep_time test1        qw    09/04/2015 17:01:00                                    6        
    
  • But throwing a 3 slot job into the mix first causes that final job to be blocked:
    [test1@login1.quack ~]$ qsub -l h_rt=1:0:0 -R y -pe ib 3 sleep_time.sh 3600
    Your job 2114655 ("sleep_time.sh") has been submitted
    
    [test1@login1.quack ~]$ qsub -l h_rt=1:0:0 -R y -pe ib 6 sleep_time.sh 3600
    Your job 2114656 ("sleep_time.sh") has been submitted
    
    [test1@login1.quack ~]$ qsub -l h_rt=1:0:0 -R y -pe ib 6 sleep_time.sh 3600
    Your job 2114657 ("sleep_time.sh") has been submitted
    
    [test1@login1.quack ~]$ qsub -l h_rt=1:0:0 -R y -pe ib 3 sleep_time.sh 3600
    Your job 2114658 ("sleep_time.sh") has been submitted
    
    [test1@login1.quack ~]$ qdel 2114655
    test1 has registered the job 2114655 for deletion
    
    [test1@login1.quack ~]$ qstat -g c
    CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
    --------------------------------------------------------------------------------
    polaris1.q                        0.00      6      0      3      9      0      0 
    
    [test1@login1.quack ~]$ qstat
    job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
    -----------------------------------------------------------------------------------------------------------------
    2114656 0.50050 sleep_time test1        r     09/04/2015 17:03:25 polaris1.q@compute1.quack.leed     6        
    2114657 0.50050 sleep_time test1        qw    09/04/2015 17:03:27                                    6        
    2114658 0.50050 sleep_time test1        qw    09/04/2015 17:03:29                                    3        
    
    

Here, job 2114658 should leap ahead of 2114657. It doesn't because gridengine decided to reserve the 3 free slots plus 3 of job 2114656, instead of all of 2114656's:

::::::::
2114656:1:RUNNING:1441382605:3660:P:ib:slots:6.000000
2114656:1:RUNNING:1441382605:3660:H:compute1.quack.leeds.ac.uk:h_vmem:1073741824.000000
2114656:1:RUNNING:1441382605:3660:H:compute1.quack.leeds.ac.uk:exclusive:1.000000
2114656:1:RUNNING:1441382605:3660:Q:polaris1.q@compute1.quack.leeds.ac.uk:slots:1.000000
2114656:1:RUNNING:1441382605:3660:H:compute2.quack.leeds.ac.uk:h_vmem:5368709120.000000
2114656:1:RUNNING:1441382605:3660:H:compute2.quack.leeds.ac.uk:exclusive:5.000000
2114656:1:RUNNING:1441382605:3660:Q:polaris1.q@compute2.quack.leeds.ac.uk:slots:5.000000
2114657:1:RESERVING:1441386265:3660:P:ib:slots:6.000000
2114657:1:RESERVING:1441386265:3660:H:compute1.quack.leeds.ac.uk:h_vmem:4294967296.000000
2114657:1:RESERVING:1441386265:3660:H:compute1.quack.leeds.ac.uk:exclusive:4.000000
2114657:1:RESERVING:1441386265:3660:Q:polaris1.q@compute1.quack.leeds.ac.uk:slots:4.000000
2114657:1:RESERVING:1441386265:3660:H:compute2.quack.leeds.ac.uk:h_vmem:2147483648.000000
2114657:1:RESERVING:1441386265:3660:H:compute2.quack.leeds.ac.uk:exclusive:2.000000
2114657:1:RESERVING:1441386265:3660:Q:polaris1.q@compute2.quack.leeds.ac.uk:slots:2.000000
2114658:1:RESERVING:1440835200:3660:P:ib:slots:3.000000
2114658:1:RESERVING:1440835200:3660:H:compute1.quack.leeds.ac.uk:h_vmem:3221225472.000000
2114658:1:RESERVING:1440835200:3660:H:compute1.quack.leeds.ac.uk:exclusive:3.000000
2114658:1:RESERVING:1440835200:3660:Q:polaris1.q@compute1.quack.leeds.ac.uk:slots:3.000000

I know scheduling is hard, and even harder to do quickly, but it might be possible to do this better (even if the general case isn't solved).

This is a noticeable problem where job sizes are a large fraction of the available resources.

Change History (0)

Note: See TracTickets for help on using tickets.