[GE users] Advance reservation strange behavior

Reuti reuti at staff.uni-marburg.de
Tue Jun 27 06:25:49 BST 2006


Hi,

is there any other job running in the system which blocks the 16-CPU  
job, and what you see is just the backfilling of empty slots in the  
meantime, until the one blocking job ends?

Cheers - Reuti


Am 26.06.2006 um 18:40 schrieb Sili (wesley) Huang:

> Hi Reuti,
>
>
>
> Yes, the parallel jobs are submitted with "-R y". The SGE 6.0u8 is  
> configured with share-tree policy implemented and the  
> max_reservation is set to 10 (note that there is only 3 parallel  
> jobs are submitted in the queue with -R y).
>
>
>
> The following senior is the problem I mentioned:
>
> In "Time 0", the 16-CPU parallel job is waiting with high priority  
> "5.41" and it is supposed to reserve CPUs for this job. However, in  
> the later time, "Time 1", the low-priority serial jobs are still  
> loaded into the released processors.
>
>
>
>
>
> (1) Time 0
>
> # qstat -u dwright
>
> job-ID  prior   name       user         state submit/start at      
> queue                          slots ja-task-ID
>
> ---------------------------------------------------------------------- 
> -------------------------------------------
>
>  385882 5.41000 OPA        dwright      qw    06/26/2006  
> 13:00:31                                   16
>
>
>
> # qstat -u l318e
>
> job-ID  prior   name       user         state submit/start at      
> queue                          slots ja-task-ID
>
> ---------------------------------------------------------------------- 
> -------------------------------------------
>
>  385884 4.50133 hello      l318e        r     06/26/2006 13:13:00  
> all.q at v60-n09                      1
>
>  385886 4.50133 hello      l318e        r     06/26/2006 13:21:00  
> all.q at v60-n29                      1
>
>  385885 4.50133 hello      l318e        r     06/26/2006 13:14:30  
> all.q at v60-n49                      1
>
>  385883 4.50133 hello      l318e        r     06/26/2006 13:11:29  
> all.q at v60-n68                      1
>
>  385815 4.50133 hello      l318e        r     06/26/2006 11:54:14  
> all.q at v60-n75                      1
>
>  385887 4.50000 hello      l318e        qw    06/26/2006  
> 13:10:26                                    1
>
>  385888 4.50000 hello      l318e        qw    06/26/2006  
> 13:10:34                                    1
>
>
>
>
>
>
>
> (2) Time 1
>
> # qstat -u dwright
>
> job-ID  prior   name       user         state submit/start at      
> queue                          slots ja-task-ID
>
> ---------------------------------------------------------------------- 
> -------------------------------------------
>
>  385882 5.41000 OPA        dwright      qw    06/26/2006  
> 13:00:31                                   16
>
>
>
> # qstat -j 385882 | grep reserv
>
> reserve:                    y
>
>
>
> # qstat -u l318e
>
> job-ID  prior   name       user         state submit/start at      
> queue                          slots ja-task-ID
>
> ---------------------------------------------------------------------- 
> -------------------------------------------
>
>  385884 4.50171 hello      l318e        r     06/26/2006 13:13:00  
> all.q at v60-n09                      1
>
>  385888 4.50171 hello      l318e        r     06/26/2006 13:29:52  
> all.q at v60-n17                      1
>
>  385886 4.50171 hello      l318e        r     06/26/2006 13:21:00  
> all.q at v60-n29                      1
>
>  385887 4.50171 hello      l318e        r     06/26/2006 13:29:08  
> all.q at v60-n35                      1
>
>  385885 4.50171 hello      l318e        r     06/26/2006 13:14:30  
> all.q at v60-n49                      1
>
>  385883 4.50171 hello      l318e        r     06/26/2006 13:11:29  
> all.q at v60-n68                      1
>
>  385815 4.50171 hello      l318e        r     06/26/2006 11:54:14  
> all.q at v60-n75                      1
>
>
>
> Hope that this can clearly explain the problem.
>
>
>
> PS: My setting in SGE scheduler
>
>
>
> ----------------------------Begin of  
> Text----------------------------------------------
>
> # qconf -ssconf
>
> algorithm                         default
>
> schedule_interval                 0:0:15
>
> maxujobs                          30
>
> queue_sort_method                 load
>
> job_load_adjustments              np_load_avg=0.50
>
> load_adjustment_decay_time        0:7:30
>
> load_formula                      np_load_avg
>
> schedd_job_info                   true
>
> flush_submit_sec                  0
>
> flush_finish_sec                  0
>
> params                            MONITOR=0
>
> reprioritize_interval             0:0:0
>
> halftime                          0
>
> usage_weight_list                  
> cpu=1.000000,mem=0.000000,io=0.000000
>
> compensation_factor               5.000000
>
> weight_user                       0.250000
>
> weight_project                    0.250000
>
> weight_department                 0.250000
>
> weight_job                        0.250000
>
> weight_tickets_functional         100000
>
> weight_tickets_share              100000
>
> share_override_tickets            TRUE
>
> share_functional_shares           TRUE
>
> max_functional_jobs_to_schedule   5000
>
> report_pjob_tickets               TRUE
>
> max_pending_tasks_per_job         50
>
> halflife_decay_list               none
>
> policy_hierarchy                  OS
>
> weight_ticket                     0.010000
>
> weight_waiting_time               0.000000
>
> weight_deadline                   3600000.000000
>
> weight_urgency                    0.900000
>
> weight_priority                   9.000000
>
> max_reservation                   10
>
> default_duration                  1:0:0
>
> ------------------------------End of  
> Text--------------------------------------------
>
>
>
> Best regards,
>
> Sili(wesley) Huang
>
>
>
> Friday, June 23, 2006, 3:46:34 PM, you wrote:
>
>
>
> Reuti> Hi,
>
>
>
> Reuti> you submitted with "-R y" and adjusted the scheduler to
>
> Reuti> "max_reservation 20" or an appropriate value?
>
>
>
> Reuti> -- Reuti
>
>
>
>
>
> Reuti> Am 23.06.2006 um 18:31 schrieb Sili (wesley) Huang:
>
>
>
> >> Hi Jean-Paul,
>
>
>
>
>
>
>
> >> I have the similar problem as yours in our cluster. the low-
>
> >> priority serial jobs still get loaded into run state and the high-
>
> >> priority parallel jobs are waiting. Did you figure out the solution
>
> >> towards this problem? Does the upgrade help?
>
>
>
>
>
>
>
> >> Cheers.
>
>
>
>
>
>
>
> >> Best regards,
>
>
>
> >> Sili(wesley) Huang
>
>
>
>
>
>
>
> >> --
>
>
>
> >> mailto:shuang at unb.ca
>
>
>
> >> Scientific Computing Support
>
>
>
> >> Advanced Computational Research Laboratory
>
>
>
> >> University of New Brunswick
>
>
>
> >> Tel(office):  (506) 452-6348
>
>
>
> >>  
> ---------------------------------------------------------------------
>
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>
> >> For additional commands, e-mail: users- 
> help at gridengine.sunsource.net
>
>
>
> Reuti>  
> ---------------------------------------------------------------------
>
> Reuti> To unsubscribe, e-mail: users- 
> unsubscribe at gridengine.sunsource.net
>
> Reuti> For additional commands, e-mail: users- 
> help at gridengine.sunsource.net
>
>
>
>
>
> --
>
> mailto:shuang at unb.ca
>
> Scientific Computing Support
>
> Advanced Computational Research Laboratory
>
> University of New Brunswick
>
> Tel(office):  (506) 452-6348
>
> ---------------------------------------------------------------------  
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net  
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list