[GE users] can't get subordinate queues to work

cjf001 john.foley at motorola.com
Fri Jun 5 20:46:12 BST 2009


All -

I've been trying to verify the operation of queue suspension in my cluster,
and I've not been able to get it working !

I'm running version 6.2u2.

I have 2 cluster queues, "primary" and "secondary", and in the primary
queue's "Subordinates" tab (in qmon) I have the secondary queue listed,
and Max Slots is empty. I've pasted the qconf outputs for these queues
below for those that want to reference them.

I have defined a hostgroup (sysadm_sim) with just my test machine in
it - it's a 4-core box. Per some recent posts on the maillist, I've also
set the "slots" complex on this host to 4. I have a test program which
just sleeps for 30 seconds and then quits. I submit this program 4 times,
using the command:

     qsub -clear -cwd -V -q secondary@@sysadm_sim  test_30

and qstat shows all 4 jobs running in the secondary queue on the test host:

cjf001 at lxint05# qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
     557 200.00000 start_test cjf001       r     06/05/2009 14:44:37 secondary at lxdelt1.srl.css.mot.     1
     558 200.00000 start_test cjf001       r     06/05/2009 14:44:37 secondary at lxdelt1.srl.css.mot.     1
     559 200.00000 start_test cjf001       r     06/05/2009 14:44:37 secondary at lxdelt1.srl.css.mot.     1
     560 200.00000 start_test cjf001       r     06/05/2009 14:44:37 secondary at lxdelt1.srl.css.mot.     1

So far, so good.

Then I submit the same job to the primary queue, using the command:

     qsub -clear -cwd -V -q primary@@sysadm_sim  test_30

What I'd expect to happen is that one of the jobs in the secondary queue
would be stopped to allow the primary job to start - however, it doesn't.
All the secondary jobs finish normally (in 30 seconds), and then the
primary job runs.

What am I missing here ?  I want the primary job to superceed/suspend the
secondary job(s).

    Thanks,

       John



cjf001 at lxint05# qconf  -sq primary
qname                 primary
hostlist              @ACoE_sim @AIG_sim @Mech_sim @microcluster @minicluster \
                       @sysadm_sim
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH
ckpt_list             NONE
pe_list               make standard_pe
rerun                 TRUE
slots                 4
tmpdir                /tmp
shell                 /bin/bash
prolog                /appl/sun/grid_engine/site_PCSRL/scripts/prolog.standard
epilog                NONE
shell_start_mode      unix_behavior
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:10
owner_list            NONE
user_lists            none_group,[@Mech_sim=amsl_group],[@AIG_sim=aig_group], \
                       [@microcluster=aig_group],[@minicluster=aig_group], \
                       [@sysadm_sim=sysadm_group]
xuser_lists           NONE
subordinate_list      secondary,[@sysadm_sim=secondary]
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY




cjf001 at lxint05# qconf  -sq secondary
qname                 secondary
hostlist              @ACoE_sim @AIG_sim @Mech_sim @microcluster @minicluster \
                       @sysadm_sim
seq_no                50
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH
ckpt_list             NONE
pe_list               make standard_pe
rerun                 FALSE
slots                 4
tmpdir                /tmp
shell                 /bin/bash
prolog                /appl/sun/grid_engine/site_PCSRL/scripts/prolog.standard
epilog                NONE
shell_start_mode      unix_behavior
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:10
owner_list            NONE
user_lists            NONE,[@sysadm_sim=sysadm_group]
xuser_lists           sysadm_group,[@sysadm_sim=NONE]
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY



-- 
###########################################################################
# John Foley                          # Location:  IL93-E1-21S            #
# IT & Systems Administration         # Maildrop:  IL93-E1-35O            #
# Antenna & Mechanical Simulation Grp #    Email: john.foley at motorola.com #
# Motorola, Inc. -  Mobile Devices    #    Phone: (847) 523-8719          #
# 600 North US Highway 45             #      Fax: (847) 523-5767          #
# Libertyville, IL. 60048  (USA)      #     Cell: (847) 460-8719          #
###########################################################################
                 (this email sent using Mozilla on Windows)

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=201015

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list