[GE users] can't get subordinate queues to work

cjf001 john.foley at motorola.com
Mon Jun 8 20:29:56 BST 2009


bump....

Gee, no one is having any problems using subordinate queues and job
suspension ? What the heck am I doing wrong ?!

     John


cjf001 wrote:

> All -
> 
> I've been trying to verify the operation of queue suspension in my cluster,
> and I've not been able to get it working !
> 
> I'm running version 6.2u2.
> 
> I have 2 cluster queues, "primary" and "secondary", and in the primary
> queue's "Subordinates" tab (in qmon) I have the secondary queue listed,
> and Max Slots is empty. I've pasted the qconf outputs for these queues
> below for those that want to reference them.
> 
> I have defined a hostgroup (sysadm_sim) with just my test machine in
> it - it's a 4-core box. Per some recent posts on the maillist, I've also
> set the "slots" complex on this host to 4. I have a test program which
> just sleeps for 30 seconds and then quits. I submit this program 4 times,
> using the command:
> 
>      qsub -clear -cwd -V -q secondary@@sysadm_sim  test_30
> 
> and qstat shows all 4 jobs running in the secondary queue on the test host:
> 
> cjf001 at lxint05# qstat
> job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
> -----------------------------------------------------------------------------------------------------------------
>      557 200.00000 start_test cjf001       r     06/05/2009 14:44:37 secondary at lxdelt1.srl.css.mot.     1
>      558 200.00000 start_test cjf001       r     06/05/2009 14:44:37 secondary at lxdelt1.srl.css.mot.     1
>      559 200.00000 start_test cjf001       r     06/05/2009 14:44:37 secondary at lxdelt1.srl.css.mot.     1
>      560 200.00000 start_test cjf001       r     06/05/2009 14:44:37 secondary at lxdelt1.srl.css.mot.     1
> 
> So far, so good.
> 
> Then I submit the same job to the primary queue, using the command:
> 
>      qsub -clear -cwd -V -q primary@@sysadm_sim  test_30
> 
> What I'd expect to happen is that one of the jobs in the secondary queue
> would be stopped to allow the primary job to start - however, it doesn't.
> All the secondary jobs finish normally (in 30 seconds), and then the
> primary job runs.
> 
> What am I missing here ?  I want the primary job to superceed/suspend the
> secondary job(s).
> 
>     Thanks,
> 
>        John
> 
> 
> 
> cjf001 at lxint05# qconf  -sq primary
> qname                 primary
> hostlist              @ACoE_sim @AIG_sim @Mech_sim @microcluster @minicluster \
>                        @sysadm_sim
> seq_no                0
> load_thresholds       np_load_avg=1.75
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH
> ckpt_list             NONE
> pe_list               make standard_pe
> rerun                 TRUE
> slots                 4
> tmpdir                /tmp
> shell                 /bin/bash
> prolog                /appl/sun/grid_engine/site_PCSRL/scripts/prolog.standard
> epilog                NONE
> shell_start_mode      unix_behavior
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:10
> owner_list            NONE
> user_lists            none_group,[@Mech_sim=amsl_group],[@AIG_sim=aig_group], \
>                        [@microcluster=aig_group],[@minicluster=aig_group], \
>                        [@sysadm_sim=sysadm_group]
> xuser_lists           NONE
> subordinate_list      secondary,[@sysadm_sim=secondary]
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  INFINITY
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
> 
> 
> 
> 
> cjf001 at lxint05# qconf  -sq secondary
> qname                 secondary
> hostlist              @ACoE_sim @AIG_sim @Mech_sim @microcluster @minicluster \
>                        @sysadm_sim
> seq_no                50
> load_thresholds       np_load_avg=1.75
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH
> ckpt_list             NONE
> pe_list               make standard_pe
> rerun                 FALSE
> slots                 4
> tmpdir                /tmp
> shell                 /bin/bash
> prolog                /appl/sun/grid_engine/site_PCSRL/scripts/prolog.standard
> epilog                NONE
> shell_start_mode      unix_behavior
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:10
> owner_list            NONE
> user_lists            NONE,[@sysadm_sim=sysadm_group]
> xuser_lists           sysadm_group,[@sysadm_sim=NONE]
> subordinate_list      NONE
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  INFINITY
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
> 
> 
> 



-- 
###########################################################################
# John Foley                          # Location:  IL93-E1-21S            #
# IT & Systems Administration         # Maildrop:  IL93-E1-35O            #
# Antenna & Mechanical Simulation Grp #    Email: john.foley at motorola.com #
# Motorola, Inc. -  Mobile Devices    #    Phone: (847) 523-8719          #
# 600 North US Highway 45             #      Fax: (847) 523-5767          #
# Libertyville, IL. 60048  (USA)      #     Cell: (847) 460-8719          #
###########################################################################
                 (this email sent using Mozilla on Windows)

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=201209

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list