[GE users] Problem setting up SGE6.2

rnirmal rnirmal7 at gmail.com
Wed Dec 17 23:48:18 GMT 2008


Did anyone get this resolved. I'm having the same problem with the 6.2 courtesy binaries. 

Qmaster starts up ok and works fine until this message starts popping up every 10 seconds

12/12/2008 16:27:05|event_|hostname|E|no event client known with id 1 to process acknowledgements
12/12/2008 16:27:05|event_|hostname|E|no event client known with id 1 to modify

After that point, every job submitted goes into the qw status. Nothing every happens until qmaster is restarted, at which point all the jobs in the queue get submitted and work fine. 

Any thoughts clues would be helpful.

Nirmal

> [sumyee at cviant32 qmaster]$ qstat -f
> queuename                      qtype resv/used/tot. load_avg arch          states
> ---------------------------------------------------------------------------------
> all.q at cviant39.cv.hp.com       BP    0/0/4          0.00     lx24-amd64
> ---------------------------------------------------------------------------------
> all.q at cviant32.cv.hp.com       BP    0/0/4          0.01     lx24-amd64
> 
> ############################################################################
>  - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
> ############################################################################
>       6 0.00000 simple.sh  sumyee       qw    09/17/2008 13:39:03     1
> 
> 
> -Sum Yee
> 
> -----Original Message-----
> From: Chris Dagdigian [mailto:dag at sonsorol.org]
> Sent: Wednesday, September 17, 2008 2:32 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Problem setting up SGE6.2
> 
> 
> Is there any output from the "qstat -f" command?
> 
> -Chris
> 
> 
> 
> On Sep 17, 2008, at 5:14 PM, Lai, Sum Yee wrote:
> 
> > Yes.  I have 2 execution hosts set up.  I have verified the daemons
> > are running.
> >
> > Sum Yee
> >
> > -----Original Message-----
> > From: Darin Perusich [mailto:Darin.Perusich at cognigencorp.com]
> > Sent: Wednesday, September 17, 2008 2:10 PM
> > To: users at gridengine.sunsource.net
> > Subject: Re: [GE users] Problem setting up SGE6.2
> >
> > Have you setup any execution hosts?
> >
> > Lai, Sum Yee wrote:
> >> Hello!
> >>
> >> We have just setup SGE6.2 on a test environment.  When I tried to
> >> submit a test job, the job doesn't get dispatched.  The message I get
> >> from qstat is:
> >>
> >> Can not get job info messages, scheduler is not available.
> >> ==============================================================
> >> job_number:                 6 exec_file:
> >> job_scripts/6 submission_time:            Wed Sep 17 13:39:03 2008
> >> owner:                      sumyee uid:                        10771
> >> group:                      users gid:                        100
> >> sge_o_home:                 /home/sumyee sge_o_log_name:
> >> sumyee sge_o_path:
> >> /usr/local/GridEngine/bin/lx24-amd64 sge_o_shell:
> >> /bin/bash sge_o_workdir:              /home/sumyee/sge/test
> >> sge_o_host:                 cviant32 account:                    sge
> >> mail_list:                  sumyee at cviant32.cv.hp.com notify:
> >> FALSE job_name:                   simple.sh jobshare:
> >> 0 shell_list:                 NONE:/bin/sh env_list: script_file:
> >> simple.sh
> >>
> >> I have verified that sge_qmaster is running on the master host.  My
> >> understanding is that sge_schedd is now incorporated into qmaster so
> >> that it doesn't run separately.  If sge_qmaster is running, why isn't
> >> the scheduler available?
> >>
> >> In the message file for qmaster, I get these two errors every 10
> >> seconds:
> >>
> >> 09/15/2008 20:06:18|event_|cviant41|E|no event client known with id 1
> >> to modify 09/15/2008 20:06:28|event_|cviant41|E|no event client known
> >> with id 1 to process acknowledgements
> >>
> >> I am not sure if the two problems are related.  Can anyone give me
> >> any suggestions on what may be causing these?
> >>
> >> My configurations is pretty much default at this point.  Here are
> >> they are anyway: [sumyee at cviant32 qmaster]$ qconf -sconf #global:
> >> execd_spool_dir              /usr/local/GridEngine/default/spool
> >> mailer                       /bin/mail xterm
> >> /usr/bin/X11/xterm load_sensor                  none prolog
> >> none epilog                       none shell_start_mode
> >> unix_behavior login_shells                 sh,ksh,csh,tcsh min_uid
> >> 0 min_gid                      0 user_lists                   none
> >> xuser_lists                  none projects                     none
> >> xprojects                    none enforce_project              false
> >> enforce_user                 auto load_report_time
> >> 00:00:40 max_unheard                  00:05:00 reschedule_unknown
> >> 00:00:00 loglevel                     log_warning administrator_mail
> >> sum-yee.lai at hp.com set_token_cmd                none pag_cmd
> >> none token_extend_time            none shepherd_cmd
> >> none qmaster_params               none execd_params
> >> none reporting_params             accounting=true reporting=true \
> >> flush_time=00:00:15 joblog=true sharelog=00:00:00 finished_jobs
> >> 100 gid_range                    20000-30000 qlogin_command
> >> builtin qlogin_daemon                builtin rlogin_command
> >> builtin rlogin_daemon                builtin rsh_command
> >> builtin rsh_daemon                   builtin max_aj_instances
> >> 2000 max_aj_tasks                 75000 max_u_jobs
> >> 0 max_jobs                     0 max_advance_reservations     0
> >> auto_user_oticket            0 auto_user_fshare             0
> >> auto_user_default_project    none auto_user_delete_time        86400
> >> delegated_file_staging       false reprioritize                 false
> >>
> >>
> >> [sumyee at cviant32 qmaster]$ qconf -ssconf algorithm
> >> default schedule_interval                 0:0:15 maxujobs
> >> 0 queue_sort_method                 load job_load_adjustments
> >> NONE load_adjustment_decay_time        00:15:00 load_formula
> >> np_load_avg schedd_job_info                   true flush_submit_sec
> >> 5 flush_finish_sec                  0 params
> >> none reprioritize_interval             0:0:0 halftime
> >> 168 usage_weight_list
> >> cpu=1.000000,mem=0.000000,io=0.000000 compensation_factor
> >> 5.000000 weight_user                       0.250000 weight_project
> >> 0.250000 weight_department                 0.250000 weight_job
> >> 0.250000 weight_tickets_functional         0 weight_tickets_share
> >> 0 share_override_tickets            TRUE share_functional_shares
> >> TRUE max_functional_jobs_to_schedule   200 report_pjob_tickets
> >> TRUE max_pending_tasks_per_job         50 halflife_decay_list
> >> none policy_hierarchy                  OFS weight_ticket
> >> 0.010000 weight_waiting_time               0.000000 weight_deadline
> >> 3600000.000000 weight_urgency                    0.100000
> >> weight_priority                   1.000000 max_reservation
> >> 0 default_duration                  INFINITY
> >>
> >> Thanks!
> >>
> >> Sum Yee
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>
> >
> > --
> > Darin Perusich
> > Unix Systems Administrator
> > Cognigen Corporation
> > 395 Youngs Rd.
> > Williamsville, NY 14221
> > Phone: 716-633-3463
> > Email: darinper at cognigencorp.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=93061

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list