[GE users] Sort by sequence number question

Daniel Templeton Dan.Templeton at Sun.COM
Wed Jul 11 19:47:16 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Wrik,

You may also want to read this post from Stephan's blog:

http://blog.sun.com/sgrell/entry/n1ge_6_scheduler_hacks_seperated

Daniel

Lönroth Erik wrote:
> I am.
>
> I'm submitting this job to the queue.
> bash-3.00$ cat slot-allocation.job 
> #!/bin/bash
> #$ -S /bin/bash
> #$ -N slot-allocation
> #$ -cwd 
> #$ -o output.$JOB_ID
> #$ -e errors.$JOB_ID
>
> #$ -pe powerflow_*_pe  5
>
> #$ -masterq master.*.q
> echo "Starting on: ${HOSTNAME}"
> echo "$PE_HOSTFILE contains:"
> cat $PE_HOSTFILE
> sleep 30
>
>
>
>     215 0.55500 slot-alloc sssler       r     07/11/2007 17:19:24 master.101.q at ts101-1-0.sss.se. MASTER 
>     215 0.55500 slot-alloc sssler       r     07/11/2007 17:19:24 short.101.q at ts101-1-0.sss.se.s SLAVE  
>                                                                   short.101.q at ts101-1-0.sss.se.s SLAVE  
>                                                                   short.101.q at ts101-1-0.sss.se.s SLAVE  
>                                                                   short.101.q at ts101-1-0.sss.se.s SLAVE  
>                                            
>
> I also sometimes see that it does allocate a slot on a "MASTER" node even if slots are available on other machines, taking only 3/4 slots and putting 1 slot on a completely different host. 
>
>
> ... like here for example where ts103-3-13 gets 3 slots filled, whereas it has 4 to offer. I would expect all 4 slots to be taken before the master.103.q at ts103-3-0 would be considered at all - since it has a higher sequence number. That doesn't seem to happen... *cry*
>
>  ( For you who has follow this thread, ts101 is a smaller test cluster we use for testing out queues and ts103+ts102 are partitions of a larger SGE_CELL ) 
>
>
>
>     977 0.55500 slot-alloc sssler       r     07/11/2007 18:04:54 short.103.q at ts103-3-12.sss.se. SLAVE         
>                                                                   short.103.q at ts103-3-12.sss.se. SLAVE         
>                                                                   short.103.q at ts103-3-12.sss.se. SLAVE         
>                                                                   short.103.q at ts103-3-12.sss.se. SLAVE         
>     977 0.55500 slot-alloc sssler       r     07/11/2007 18:04:54 short.103.q at ts103-3-13.sss.se. SLAVE         
>                                                                   short.103.q at ts103-3-13.sss.se. SLAVE         
>                                                                   short.103.q at ts103-3-13.sss.se. SLAVE         
>     977 0.55500 slot-alloc sssler       r     07/11/2007 18:04:54 short.103.q at ts103-3-14.sss.se. SLAVE         
>                                                                   short.103.q at ts103-3-14.sss.se. SLAVE         
>                                                                   short.103.q at ts103-3-14.sss.se. SLAVE         
>                                                                   short.103.q at ts103-3-14.sss.se. SLAVE         
>     977 0.55500 slot-alloc sssler       r     07/11/2007 18:04:54 short.103.q at ts103-3-15.sss.se. SLAVE        
>     977 0.55500 slot-alloc sssler       r     07/11/2007 18:04:54 master.103.q at ts103-3-0.sss.se. MASTER        
>     977 0.55500 slot-alloc sssler       r     07/11/2007 18:04:54 master.103.q at ts103-3-1.sss.se. SLAVE
>
> I'm in pain. Arhhhh!
>
> /Erik
>
> -----Original Message-----
> From: Ravi Chandra Nallan [mailto:Ravichandra.Nallan at Sun.COM]
> Sent: Wed 7/11/2007 5:05 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Sort by sequence number question
>  
> Can you try with some simple batch/array jobs
> eg. qsub -t 1-5 examples/jobs/sleeper.sh 10000
> and see which one gets filled first!
> regards,
> -Ravi
>
> Lönroth Erik wrote:
>   
>>> Didn't seem to work.
>>>  qconf -sconf 
>>>  qconf -ssconf
>>>  qconf -sq \*
>>>  qconf -se global
>>>
>>> Might be a better option.
>>> /mark
>>>     
>>>       
>> Here it goes:
>>
>> bash-3.00$ qconf -sconf
>> global:
>> execd_spool_dir              /opt/gridengine/narcissus/spool
>> mailer                       /opt/gridengine/scania/utils/mailing/mailer1.sh
>> xterm                        /usr/bin/X11/xterm
>> load_sensor                  /opt/gridengine/scania/utils/licensecheck.sh
>> prolog                       none
>> epilog                       none
>> shell_start_mode             posix_compliant
>> login_shells                 sh,ksh,csh,tcsh
>> min_uid                      0
>> min_gid                      0
>> user_lists                   none
>> xuser_lists                  none
>> projects                     none
>> xprojects                    none
>> enforce_project              false
>> enforce_user                 auto
>> load_report_time             00:00:40
>> max_unheard                  00:05:00
>> reschedule_unknown           00:00:00
>> loglevel                     log_warning
>> administrator_mail           erik.lonroth at scania.com
>> set_token_cmd                none
>> pag_cmd                      none
>> token_extend_time            none
>> shepherd_cmd                 none
>> qmaster_params               none
>> execd_params                 none
>> reporting_params             accounting=true reporting=false \
>>                              flush_time=00:00:15 joblog=false sharelog=00:00:00
>> finished_jobs                100
>> gid_range                    20000-20500
>> qlogin_command               /opt/gridengine/scania/utils/qlogin/qlogin.sh
>> qlogin_daemon                /usr/sbin/sshd -i
>> rlogin_daemon                /usr/sbin/sshd -i
>> max_aj_instances             0
>> max_aj_tasks                 0
>> max_u_jobs                   0
>> max_jobs                     0
>> auto_user_oticket            0
>> auto_user_fshare             0
>> auto_user_default_project    none
>> auto_user_delete_time        86400
>> delegated_file_staging       false
>> rsh_daemon                   /usr/sbin/sshd -i
>> rsh_command                  /usr/bin/ssh
>> rlogin_command               /usr/bin/ssh
>> reprioritize                 0
>>
>>
>>
>>
>> bash-3.00$   qconf -ssconf
>> algorithm                         default
>> schedule_interval                 0:0:15
>> maxujobs                          0
>> queue_sort_method                 seqno
>> job_load_adjustments              np_load_avg=0.50
>> load_adjustment_decay_time        0:7:30
>> load_formula                      np_load_avg
>> schedd_job_info                   true
>> flush_submit_sec                  0
>> flush_finish_sec                  0
>> params                            none
>> reprioritize_interval             0:0:0
>> halftime                          168
>> usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
>> compensation_factor               5.000000
>> weight_user                       0.250000
>> weight_project                    0.250000
>> weight_department                 0.250000
>> weight_job                        0.250000
>> weight_tickets_functional         0
>> weight_tickets_share              0
>> share_override_tickets            TRUE
>> share_functional_shares           TRUE
>> max_functional_jobs_to_schedule   200
>> report_pjob_tickets               TRUE
>> max_pending_tasks_per_job         50
>> halflife_decay_list               none
>> policy_hierarchy                  OFS
>> weight_ticket                     0.010000
>> weight_waiting_time               0.000000
>> weight_deadline                   3600000.000000
>> weight_urgency                    0.100000
>> weight_priority                   1.000000
>> max_reservation                   0
>> default_duration                  0:10:0
>>
>>
>>
>> bash-3.00$   qconf -sq \*
>> qname                 master.101.q
>> hostlist              ts101-1-0.sss.se.scania.com
>> seq_no                0
>> load_thresholds       np_load_avg=1.75
>> suspend_thresholds    NONE
>> nsuspend              1
>> suspend_interval      00:05:00
>> priority              0
>> min_cpu_interval      00:05:00
>> processors            UNDEFINED
>> qtype                 BATCH INTERACTIVE
>> ckpt_list             NONE
>> pe_list               dummy_ts101_pe fire_101_pe fluent_ts101_pe make \
>>                       mpich_ts101_pe powerflow_ts101_pe
>> rerun                 FALSE
>> slots                 1
>> tmpdir                /tmp
>> shell                 /bin/csh
>> prolog                NONE
>> epilog                NONE
>> shell_start_mode      posix_compliant
>> starter_method        NONE
>> suspend_method        NONE
>> resume_method         NONE
>> terminate_method      NONE
>> notify                00:00:60
>> owner_list            NONE
>> user_lists            NONE
>> xuser_lists           NONE
>> subordinate_list      NONE
>> complex_values        NONE
>> projects              NONE
>> xprojects             NONE
>> calendar              NONE
>> initial_state         default
>> s_rt                  INFINITY
>> h_rt                  INFINITY
>> s_cpu                 INFINITY
>> h_cpu                 INFINITY
>> s_fsize               INFINITY
>> h_fsize               INFINITY
>> s_data                INFINITY
>> h_data                INFINITY
>> s_stack               INFINITY
>> h_stack               INFINITY
>> s_core                INFINITY
>> h_core                INFINITY
>> s_rss                 INFINITY
>> h_rss                 INFINITY
>> s_vmem                INFINITY
>> h_vmem                INFINITY
>>
>>
>>
>> qname                 short.101.q
>> hostlist              @ts101_X_hg
>> seq_no                101,[ts101-1-0.sss.se.scania.com=9999]
>> load_thresholds       np_load_avg=1.75
>> suspend_thresholds    NONE
>> nsuspend              1
>> suspend_interval      00:05:00
>> priority              0
>> min_cpu_interval      00:05:00
>> processors            UNDEFINED
>> qtype                 BATCH INTERACTIVE
>> ckpt_list             NONE
>> pe_list               dummy_ts101_pe fire_101_pe fluent_ts101_pe make \
>>                       mpich_ts101_pe powerflow_ts101_pe
>> rerun                 FALSE
>> slots                 4
>> tmpdir                /tmp
>> shell                 /bin/csh
>> prolog                NONE
>> epilog                NONE
>> shell_start_mode      posix_compliant
>> starter_method        NONE
>> suspend_method        NONE
>> resume_method         NONE
>> terminate_method      NONE
>> notify                00:00:60
>> owner_list            NONE
>> user_lists            NONE
>> xuser_lists           NONE
>> subordinate_list      NONE
>> complex_values        NONE
>> projects              NONE
>> xprojects             NONE
>> calendar              NONE
>> initial_state         default
>> s_rt                  INFINITY
>> h_rt                  INFINITY
>> s_cpu                 INFINITY
>> h_cpu                 INFINITY
>> s_fsize               INFINITY
>> h_fsize               INFINITY
>> s_data                INFINITY
>> h_data                INFINITY
>> s_stack               INFINITY
>> h_stack               INFINITY
>> s_core                INFINITY
>> h_core                INFINITY
>> s_rss                 INFINITY
>> h_rss                 INFINITY
>> s_vmem                INFINITY
>> h_vmem                INFINITY
>>
>>
>>
>>
>>
>> bash-3.00$ qconf -se global
>> hostname              global
>> load_scaling          NONE
>> complex_values        fluent_all=10,fluent_par=48,gtpowerx=7,dyna=18
>> load_values           dyna=2,fluent_all=8,fluent_par=41,gtpowerx=2
>> processors            0
>> user_lists            NONE
>> xuser_lists           NONE
>> projects              NONE
>> xprojects             NONE
>> usage_scaling         NONE
>> report_variables      NONE
>>
>>   
>> ------------------------------------------------------------------------
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>   
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
>   
> ------------------------------------------------------------------------
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>   




    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list