[GE users] can't get subordinate queues to work

cjf001 john.foley at motorola.com
Tue Jun 9 16:13:18 BST 2009


OK, I think I found the issue with this....

As I mentioned, I had set the "slots" complex on this host to 4,
per some recent posts on the maillist - that seems to have caused
the scheduler to decide that the queue was never full enough to
suspend. As soon as I took that complex out (deleted it, actually),
the suspention started working as expected.

     John


cjf001 wrote:

> bump....
> 
> Gee, no one is having any problems using subordinate queues and job
> suspension ? What the heck am I doing wrong ?!
> 
>      John
> 
> 
> cjf001 wrote:
> 
> 
>>All -
>>
>>I've been trying to verify the operation of queue suspension in my cluster,
>>and I've not been able to get it working !
>>
>>I'm running version 6.2u2.
>>
>>I have 2 cluster queues, "primary" and "secondary", and in the primary
>>queue's "Subordinates" tab (in qmon) I have the secondary queue listed,
>>and Max Slots is empty. I've pasted the qconf outputs for these queues
>>below for those that want to reference them.
>>
>>I have defined a hostgroup (sysadm_sim) with just my test machine in
>>it - it's a 4-core box. Per some recent posts on the maillist, I've also
>>set the "slots" complex on this host to 4. I have a test program which
>>just sleeps for 30 seconds and then quits. I submit this program 4 times,
>>using the command:
>>
>>     qsub -clear -cwd -V -q secondary@@sysadm_sim  test_30
>>
>>and qstat shows all 4 jobs running in the secondary queue on the test host:
>>
>>cjf001 at lxint05# qstat
>>job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
>>-----------------------------------------------------------------------------------------------------------------
>>     557 200.00000 start_test cjf001       r     06/05/2009 14:44:37 secondary at lxdelt1.srl.css.mot.     1
>>     558 200.00000 start_test cjf001       r     06/05/2009 14:44:37 secondary at lxdelt1.srl.css.mot.     1
>>     559 200.00000 start_test cjf001       r     06/05/2009 14:44:37 secondary at lxdelt1.srl.css.mot.     1
>>     560 200.00000 start_test cjf001       r     06/05/2009 14:44:37 secondary at lxdelt1.srl.css.mot.     1
>>
>>So far, so good.
>>
>>Then I submit the same job to the primary queue, using the command:
>>
>>     qsub -clear -cwd -V -q primary@@sysadm_sim  test_30
>>
>>What I'd expect to happen is that one of the jobs in the secondary queue
>>would be stopped to allow the primary job to start - however, it doesn't.
>>All the secondary jobs finish normally (in 30 seconds), and then the
>>primary job runs.
>>
>>What am I missing here ?  I want the primary job to superceed/suspend the
>>secondary job(s).
>>
>>    Thanks,
>>
>>       John
>>
>>
>>
>>cjf001 at lxint05# qconf  -sq primary
>>qname                 primary
>>hostlist              @ACoE_sim @AIG_sim @Mech_sim @microcluster @minicluster \
>>                       @sysadm_sim
>>seq_no                0
>>load_thresholds       np_load_avg=1.75
>>suspend_thresholds    NONE
>>nsuspend              1
>>suspend_interval      00:05:00
>>priority              0
>>min_cpu_interval      00:05:00
>>processors            UNDEFINED
>>qtype                 BATCH
>>ckpt_list             NONE
>>pe_list               make standard_pe
>>rerun                 TRUE
>>slots                 4
>>tmpdir                /tmp
>>shell                 /bin/bash
>>prolog                /appl/sun/grid_engine/site_PCSRL/scripts/prolog.standard
>>epilog                NONE
>>shell_start_mode      unix_behavior
>>starter_method        NONE
>>suspend_method        NONE
>>resume_method         NONE
>>terminate_method      NONE
>>notify                00:00:10
>>owner_list            NONE
>>user_lists            none_group,[@Mech_sim=amsl_group],[@AIG_sim=aig_group], \
>>                       [@microcluster=aig_group],[@minicluster=aig_group], \
>>                       [@sysadm_sim=sysadm_group]
>>xuser_lists           NONE
>>subordinate_list      secondary,[@sysadm_sim=secondary]
>>complex_values        NONE
>>projects              NONE
>>xprojects             NONE
>>calendar              NONE
>>initial_state         default
>>s_rt                  INFINITY
>>h_rt                  INFINITY
>>s_cpu                 INFINITY
>>h_cpu                 INFINITY
>>s_fsize               INFINITY
>>h_fsize               INFINITY
>>s_data                INFINITY
>>h_data                INFINITY
>>s_stack               INFINITY
>>h_stack               INFINITY
>>s_core                INFINITY
>>h_core                INFINITY
>>s_rss                 INFINITY
>>h_rss                 INFINITY
>>s_vmem                INFINITY
>>h_vmem                INFINITY
>>
>>
>>
>>
>>cjf001 at lxint05# qconf  -sq secondary
>>qname                 secondary
>>hostlist              @ACoE_sim @AIG_sim @Mech_sim @microcluster @minicluster \
>>                       @sysadm_sim
>>seq_no                50
>>load_thresholds       np_load_avg=1.75
>>suspend_thresholds    NONE
>>nsuspend              1
>>suspend_interval      00:05:00
>>priority              0
>>min_cpu_interval      00:05:00
>>processors            UNDEFINED
>>qtype                 BATCH
>>ckpt_list             NONE
>>pe_list               make standard_pe
>>rerun                 FALSE
>>slots                 4
>>tmpdir                /tmp
>>shell                 /bin/bash
>>prolog                /appl/sun/grid_engine/site_PCSRL/scripts/prolog.standard
>>epilog                NONE
>>shell_start_mode      unix_behavior
>>starter_method        NONE
>>suspend_method        NONE
>>resume_method         NONE
>>terminate_method      NONE
>>notify                00:00:10
>>owner_list            NONE
>>user_lists            NONE,[@sysadm_sim=sysadm_group]
>>xuser_lists           sysadm_group,[@sysadm_sim=NONE]
>>subordinate_list      NONE
>>complex_values        NONE
>>projects              NONE
>>xprojects             NONE
>>calendar              NONE
>>initial_state         default
>>s_rt                  INFINITY
>>h_rt                  INFINITY
>>s_cpu                 INFINITY
>>h_cpu                 INFINITY
>>s_fsize               INFINITY
>>h_fsize               INFINITY
>>s_data                INFINITY
>>h_data                INFINITY
>>s_stack               INFINITY
>>h_stack               INFINITY
>>s_core                INFINITY
>>h_core                INFINITY
>>s_rss                 INFINITY
>>h_rss                 INFINITY
>>s_vmem                INFINITY
>>h_vmem                INFINITY
>>
>>
>>
> 
> 
> 
> 



-- 
###########################################################################
# John Foley                          # Location:  IL93-E1-21S            #
# IT & Systems Administration         # Maildrop:  IL93-E1-35O            #
# Antenna & Mechanical Simulation Grp #    Email: john.foley at motorola.com #
# Motorola, Inc. -  Mobile Devices    #    Phone: (847) 523-8719          #
# 600 North US Highway 45             #      Fax: (847) 523-5767          #
# Libertyville, IL. 60048  (USA)      #     Cell: (847) 460-8719          #
###########################################################################
                 (this email sent using Mozilla on Windows)

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=201317

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list