[GE users] Problem setting up SGE6.2

Lai, Sum Yee sum-yee.lai at hp.com
Wed Sep 17 22:06:00 BST 2008


We have just setup SGE6.2 on a test environment.  When I tried to submit a test job, the job doesn't get dispatched.  The message I get from qstat is:

Can not get job info messages, scheduler is not available.
job_number:                 6
exec_file:                  job_scripts/6
submission_time:            Wed Sep 17 13:39:03 2008
owner:                      sumyee
uid:                        10771
group:                      users
gid:                        100
sge_o_home:                 /home/sumyee
sge_o_log_name:             sumyee
sge_o_path:                 /usr/local/GridEngine/bin/lx24-amd64
sge_o_shell:                /bin/bash
sge_o_workdir:              /home/sumyee/sge/test
sge_o_host:                 cviant32
account:                    sge
mail_list:                  sumyee at cviant32.cv.hp.com
notify:                     FALSE
job_name:                   simple.sh
jobshare:                   0
shell_list:                 NONE:/bin/sh
script_file:                simple.sh

I have verified that sge_qmaster is running on the master host.  My understanding is that sge_schedd is now incorporated into qmaster so that it doesn't run separately.  If sge_qmaster is running, why isn't the scheduler available?

In the message file for qmaster, I get these two errors every 10 seconds:

09/15/2008 20:06:18|event_|cviant41|E|no event client known with id 1 to modify
09/15/2008 20:06:28|event_|cviant41|E|no event client known with id 1 to process acknowledgements

I am not sure if the two problems are related.  Can anyone give me any suggestions on what may be causing these?

My configurations is pretty much default at this point.  Here are they are anyway:
[sumyee at cviant32 qmaster]$ qconf -sconf
execd_spool_dir              /usr/local/GridEngine/default/spool
mailer                       /bin/mail
xterm                        /usr/bin/X11/xterm
load_sensor                  none
prolog                       none
epilog                       none
shell_start_mode             unix_behavior
login_shells                 sh,ksh,csh,tcsh
min_uid                      0
min_gid                      0
user_lists                   none
xuser_lists                  none
projects                     none
xprojects                    none
enforce_project              false
enforce_user                 auto
load_report_time             00:00:40
max_unheard                  00:05:00
reschedule_unknown           00:00:00
loglevel                     log_warning
administrator_mail           sum-yee.lai at hp.com
set_token_cmd                none
pag_cmd                      none
token_extend_time            none
shepherd_cmd                 none
qmaster_params               none
execd_params                 none
reporting_params             accounting=true reporting=true \
                             flush_time=00:00:15 joblog=true sharelog=00:00:00
finished_jobs                100
gid_range                    20000-30000
qlogin_command               builtin
qlogin_daemon                builtin
rlogin_command               builtin
rlogin_daemon                builtin
rsh_command                  builtin
rsh_daemon                   builtin
max_aj_instances             2000
max_aj_tasks                 75000
max_u_jobs                   0
max_jobs                     0
max_advance_reservations     0
auto_user_oticket            0
auto_user_fshare             0
auto_user_default_project    none
auto_user_delete_time        86400
delegated_file_staging       false
reprioritize                 false

[sumyee at cviant32 qmaster]$ qconf -ssconf
algorithm                         default
schedule_interval                 0:0:15
maxujobs                          0
queue_sort_method                 load
job_load_adjustments              NONE
load_adjustment_decay_time        00:15:00
load_formula                      np_load_avg
schedd_job_info                   true
flush_submit_sec                  5
flush_finish_sec                  0
params                            none
reprioritize_interval             0:0:0
halftime                          168
usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor               5.000000
weight_user                       0.250000
weight_project                    0.250000
weight_department                 0.250000
weight_job                        0.250000
weight_tickets_functional         0
weight_tickets_share              0
share_override_tickets            TRUE
share_functional_shares           TRUE
max_functional_jobs_to_schedule   200
report_pjob_tickets               TRUE
max_pending_tasks_per_job         50
halflife_decay_list               none
policy_hierarchy                  OFS
weight_ticket                     0.010000
weight_waiting_time               0.000000
weight_deadline                   3600000.000000
weight_urgency                    0.100000
weight_priority                   1.000000
max_reservation                   0
default_duration                  INFINITY


Sum Yee

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list