[GE users] Advance reservation strange behavior

Sili (wesley) Huang shuang at unb.ca
Mon Jun 26 17:40:05 BST 2006


    [ The following text is in the "iso-8859-15" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


Hi Reuti,


Yes, the parallel jobs are submitted with "-R y". The SGE 6.0u8 is configured
with share-tree policy implemented and the max_reservation is set to 10 (note
that there is only 3 parallel jobs are submitted in the queue with -R y). 


The following senior is the problem I mentioned:

In "Time 0", the 16-CPU parallel job is waiting with high priority "5.41" and it
is supposed to reserve CPUs for this job. However, in the later time, "Time 1",
the low-priority serial jobs are still loaded into the released processors. 



(1) Time 0

# qstat -u dwright

job-ID  prior   name       user         state submit/start at     queue        
                 slots ja-task-ID 

-------------------------------------------------------------------------------
----------------------------------

 385882 5.41000 OPA        dwright      qw    06/26/2006 13:00:31              
                    16        


# qstat -u l318e

job-ID  prior   name       user         state submit/start at     queue        
                 slots ja-task-ID 

-------------------------------------------------------------------------------
----------------------------------

 385884 4.50133 hello      l318e        r     06/26/2006 13:13:00 all.q at v60-n09
                     1        

 385886 4.50133 hello      l318e        r     06/26/2006 13:21:00 all.q at v60-n29
                     1        

 385885 4.50133 hello      l318e        r     06/26/2006 13:14:30 all.q at v60-n49
                     1        

 385883 4.50133 hello      l318e        r     06/26/2006 13:11:29 all.q at v60-n68
                     1        

 385815 4.50133 hello      l318e        r     06/26/2006 11:54:14 all.q at v60-n75
                     1        

 385887 4.50000 hello      l318e        qw    06/26/2006 13:10:26              
                     1        

 385888 4.50000 hello      l318e        qw    06/26/2006 13:10:34              
                     1        




(2) Time 1

# qstat -u dwright

job-ID  prior   name       user         state submit/start at     queue        
                 slots ja-task-ID 

-------------------------------------------------------------------------------
----------------------------------

 385882 5.41000 OPA        dwright      qw    06/26/2006 13:00:31              
                    16        


# qstat -j 385882 | grep reserv

reserve:                    y


# qstat -u l318e

job-ID  prior   name       user         state submit/start at     queue        
                 slots ja-task-ID 

-------------------------------------------------------------------------------
----------------------------------

 385884 4.50171 hello      l318e        r     06/26/2006 13:13:00 all.q at v60-n09
                     1        

 385888 4.50171 hello      l318e        r     06/26/2006 13:29:52 all.q at v60-n17
                     1        

 385886 4.50171 hello      l318e        r     06/26/2006 13:21:00 all.q at v60-n29
                     1        

 385887 4.50171 hello      l318e        r     06/26/2006 13:29:08 all.q at v60-n35
                     1        

 385885 4.50171 hello      l318e        r     06/26/2006 13:14:30 all.q at v60-n49
                     1        

 385883 4.50171 hello      l318e        r     06/26/2006 13:11:29 all.q at v60-n68
                     1        

 385815 4.50171 hello      l318e        r     06/26/2006 11:54:14 all.q at v60-n75
                     1        


Hope that this can clearly explain the problem. 


PS: My setting in SGE scheduler


----------------------------Begin of
Text----------------------------------------------

# qconf -ssconf                      

algorithm                         default

schedule_interval                 0:0:15

maxujobs                          30

queue_sort_method                 load

job_load_adjustments              np_load_avg=0.50

load_adjustment_decay_time        0:7:30

load_formula                      np_load_avg

schedd_job_info                   true

flush_submit_sec                  0

flush_finish_sec                  0

params                            MONITOR=0

reprioritize_interval             0:0:0

halftime                          0

usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000

compensation_factor               5.000000

weight_user                       0.250000

weight_project                    0.250000

weight_department                 0.250000

weight_job                        0.250000

weight_tickets_functional         100000

weight_tickets_share              100000

share_override_tickets            TRUE

share_functional_shares           TRUE

max_functional_jobs_to_schedule   5000

report_pjob_tickets               TRUE

max_pending_tasks_per_job         50

halflife_decay_list               none

policy_hierarchy                  OS

weight_ticket                     0.010000

weight_waiting_time               0.000000

weight_deadline                   3600000.000000

weight_urgency                    0.900000

weight_priority                   9.000000

max_reservation                   10

default_duration                  1:0:0

------------------------------End of
Text--------------------------------------------


Best regards,

Sili(wesley) Huang


Friday, June 23, 2006, 3:46:34 PM, you wrote:


Reuti> Hi,


Reuti> you submitted with "-R y" and adjusted the scheduler to  

Reuti> "max_reservation 20" or an appropriate value?


Reuti> -- Reuti



Reuti> Am 23.06.2006 um 18:31 schrieb Sili (wesley) Huang:


>> Hi Jean-Paul,




>> I have the similar problem as yours in our cluster. the low- 

>> priority serial jobs still get loaded into run state and the high- 

>> priority parallel jobs are waiting. Did you figure out the solution  

>> towards this problem? Does the upgrade help?




>> Cheers.




>> Best regards,


>> Sili(wesley) Huang




>> --


>> mailto:shuang at unb.ca


>> Scientific Computing Support


>> Advanced Computational Research Laboratory


>> University of New Brunswick


>> Tel(office):  (506) 452-6348


>> ---------------------------------------------------------------------  

>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net  

>> For additional commands, e-mail: users-help at gridengine.sunsource.net


Reuti> ---------------------------------------------------------------------

Reuti> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net

Reuti> For additional commands, e-mail: users-help at gridengine.sunsource.net



--

mailto:shuang at unb.ca

Scientific Computing Support

Advanced Computational Research Laboratory

University of New Brunswick

Tel(office):  (506) 452-6348

--------------------------------------------------------------------- To
unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net For additional
commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list