[GE users] scheduling weirdness in 6.0u3

Sean Dilda agrajag at dragaera.net
Thu Apr 7 14:54:39 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
> 
> Sean Dilda wrote:
>
>>
>>
>>The strange thing is that I'm getting different results on my production 
>>cluster and test cluster.  The test cluster was seeing problems only 
>>with parallel jobs, and the above patch seems to have fixed it.  The 
>>production cluster is having issues with the parallel and non-parallel 
>>jobs, and that patch didn't seem to change anything.  I'll do some more 
>>testing and let you know if I can figure out why its going haywire.
>>
> 
> Well, could it be, that you use a slightly different queue and scheduler
> configuration? Could you post your configuration? The list might be able
> to help you. Are just compare the configurations between the two grids.

The main configuration and scheduler configs match.  I'm slowly going 
through and making the two configs as close as possible.  I've also 
attached the main configuration, scheduler config, and my two queue 
configs in case anyone wants to take a peak.

> 
> 
>>On a slightly different note, is there any word on when 6.0u4 might be 
>>released?  I notice there's been a number of updates between 6.0u3 and 
>>the current maintrunk.
>> 
>>
> 
> We are currently in the test phase and will have u4 ready as soon as our
> tests are done. I do not know the exact schedule.
> 

Thanks for the info.


    [ Part 2: "Attached Text" ]

qname                 highprio.q
hostlist              @cbcb @nsoe @stat @compeb @chg @bio
seq_no                0
load_thresholds       NONE
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               high
rerun                 FALSE
slots                 2
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE,[@cbcb=cbcb],[@nsoe=nsoe],[@stat=stat], \
                      [@compeb=compeb],[@chg=chg],[@bio=bio]
xuser_lists           NONE
subordinate_list      NONE
complex_values        highprio=1,centos=1
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY


    [ Part 3: "Attached Text" ]

qname                 lowprio.q
hostlist              @core @cbcb @nsoe @stat @compeb @chg @opteron @bio @cod
seq_no                0
load_thresholds       np_load_avg=0.85,mem_free=50M
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              19
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               low-all,[@core=low-all low-core],[@opteron=amd64]
rerun                 FALSE
slots                 2,[@cod=1]
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        centos=1,[@opteron=opteron=1]
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY


    [ Part 4: "Attached Text" ]

algorithm                         default
schedule_interval                 0:0:15
maxujobs                          260
queue_sort_method                 load
job_load_adjustments              np_load_avg=1.0,mem_free=900M
load_adjustment_decay_time        0:7:30
load_formula                      np_load_avg
schedd_job_info                   true
flush_submit_sec                  1
flush_finish_sec                  1
params                            none
reprioritize_interval             0:0:0
halftime                          168
usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor               5.000000
weight_user                       0.250000
weight_project                    0.250000
weight_department                 0.250000
weight_job                        0.250000
weight_tickets_functional         0
weight_tickets_share              0
share_override_tickets            TRUE
share_functional_shares           TRUE
max_functional_jobs_to_schedule   200
report_pjob_tickets               TRUE
max_pending_tasks_per_job         50
halflife_decay_list               none
policy_hierarchy                  OFS
weight_ticket                     0.100000
weight_waiting_time               0.040000
weight_deadline                   3600000.000000
weight_urgency                    1.000000
weight_priority                   0.010000
max_reservation                   0
default_duration                  120:0:0


    [ Part 5: "Attached Text" ]

global:
execd_spool_dir              /var/lib/sge_execd
mailer                       /bin/mail
xterm                        /bin/tcsh
load_sensor                  none
prolog                       none
epilog                       none
shell_start_mode             posix_compliant
login_shells                 sh,ksh,csh,tcsh,bash
min_uid                      500
min_gid                      500
user_lists                   none
xuser_lists                  none
projects                     none
xprojects                    none
enforce_project              false
enforce_user                 auto
load_report_time             00:00:40
stat_log_time                48:00:00
max_unheard                  00:01:45
reschedule_unknown           00:00:00
loglevel                     log_info
administrator_mail           sean at duke.edu
set_token_cmd                none
pag_cmd                      none
token_extend_time            none
shepherd_cmd                 none
qmaster_params               none
execd_params                 none
reporting_params             accounting=true reporting=false \
                             flush_time=00:00:15 joblog=true sharelog=00:00:00
finished_jobs                100
gid_range                    20000-20100
qlogin_command               /home/csem/sean/qlogin_wrapper
qlogin_daemon                /usr/sbin/sshd -i
rlogin_command               /usr/bin/ssh -o ConnectionAttempts=8
rlogin_daemon                /usr/sbin/sshd -i
rsh_command                  /usr/bin/ssh -o ConnectionAttempts=8
rsh_daemon                   /usr/sbin/sshd -i
max_aj_instances             2000
max_aj_tasks                 75000
max_u_jobs                   0
max_jobs                     0
auto_user_oticket            0
auto_user_fshare             0
auto_user_default_project    none
auto_user_delete_time        0
delegated_file_staging       false
reprioritize                 0




    [ Part 6: "Attached Text" ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list