[GE users] SDM GE adapter sve got in trouble

cbyun cbyun at ll.mit.edu
Mon Aug 24 17:25:23 BST 2009


Hi Michal,

I tried the following steps but it still failed to clear the issue.
Any suggestion to clear the issue?

- Remove GE adapter service
        sdmadm rs -s gesvc -force

- Shutdonw/Startup SDM master service
        sdmadm sdj -all -h localhost
        sdmadm suj

- Add GE adapter service
        sdmadm ags -h localhost -j cs_vm -s gesvc -f \
                  <path>/ge_adapter_svc_config.xml

- Startup GE component
        sdmadm suc -c gesvc -h localhost

I'm still getting the same error:

# 08/24/2009 12:20:08|16|e.impl.ge.GEServiceAdapterImpl.doStartService|I|Service gesvc: Starting Grid Engine service
08/24/2009 12:20:08|16|rm.service.impl.AbstractServiceAdapter$1.call|E|Service startup failed: jgdi error: java.lang.IllegalStateException: content field STU_name not found in descriptor
                                                                      |set_object_attribute: set_list of property reportVariables failed
                                                                      |
08/24/2009 12:20:08|17|rm.impl.AbstractComponent$3.performTransition|W|Componentgesvc: Error in startup procedure: Service gesvc: Unexpected error in state transition UnknownStateHandler[UNKNOWN] -> StartingStateHandler[STARTING]: Service startup failed: jgdi error: java.lang.IllegalStateException: content field STU_name not found in descriptor
                                                                      |set_object_attribute: set_list of property reportVariables failed
                                                                      |


Thanks,
- Chansup


> -----Original Message-----
> From: cbyun [mailto:cbyun at ll.mit.edu]
> Sent: Monday, August 24, 2009 9:51 AM
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] SDM GE adapter sve got in trouble
>
> Michal,
>
> > -----Original Message-----
> > From: Michal.Bachorik at sun.com [mailto:Michal.Bachorik at sun.com]
> > Sent: Monday, August 24, 2009 9:23 AM
> > To: users at gridengine.sunsource.net
> > Subject: Re: [GE users] SDM GE adapter sve got in trouble
> >
> > Chansup,
> >
> > actually is something called "STU_name" part of your changes? The error
> > says something about "STU_name" not being part of descriptor, so I'd
> > like to know whether it is something you introduced or touched or not ..
> >
>
> No, I have no such a thing called as "STU_name" in my configuration.
> Here is my configuration:
>
> # qconf -sconf
> #global:
> execd_spool_dir              /var/spool/sge
> mailer                       /bin/mail
> xterm                        /usr/bin/X11/xterm
> load_sensor                  none
> prolog                       none
> epilog                       none
> shell_start_mode             posix_compliant
> login_shells                 sh,ksh,csh,tcsh
> min_uid                      0
> min_gid                      0
> user_lists                   none
> xuser_lists                  none
> projects                     none
> xprojects                    none
> enforce_project              false
> enforce_user                 auto
> load_report_time             00:00:40
> max_unheard                  00:05:00
> reschedule_unknown           00:00:00
> loglevel                     log_warning
> administrator_mail           none
> set_token_cmd                none
> pag_cmd                      none
> token_extend_time            none
> shepherd_cmd                 none
> qmaster_params               none
> execd_params                 none
> reporting_params             accounting=true reporting=true \
>                              flush_time=00:00:15 joblog=true
> sharelog=00:00:00
> finished_jobs                100
> gid_range                    20000-20100
> qlogin_command               builtin
> qlogin_daemon                builtin
> rlogin_command               builtin
> rlogin_daemon                builtin
> rsh_command                  builtin
> rsh_daemon                   builtin
> max_aj_instances             2000
> max_aj_tasks                 75000
> max_u_jobs                   0
> max_jobs                     0
> max_advance_reservations     0
> auto_user_oticket            0
> auto_user_fshare             0
> auto_user_default_project    none
> auto_user_delete_time        86400
> delegated_file_staging       false
> reprioritize                 0
> jsv_url                      none
> libjvm_path
> /usr/java/latest/jre/lib/amd64/server/libjvm.so
> additional_jvm_args          -Xmx256m
> jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w
>
>
> # qconf -ssconf
> algorithm                         default
> schedule_interval                 0:2:0
> maxujobs                          0
> queue_sort_method                 load
> job_load_adjustments              NONE
> load_adjustment_decay_time        0:0:0
> load_formula                      np_load_avg
> schedd_job_info                   true
> flush_submit_sec                  2
> flush_finish_sec                  2
> params                            none
> reprioritize_interval             0:0:0
> halftime                          168
> usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
> compensation_factor               5.000000
> weight_user                       0.250000
> weight_project                    0.250000
> weight_department                 0.250000
> weight_job                        0.250000
> weight_tickets_functional         0
> weight_tickets_share              0
> share_override_tickets            TRUE
> share_functional_shares           TRUE
> max_functional_jobs_to_schedule   200
> report_pjob_tickets               FALSE
> max_pending_tasks_per_job         50
> halflife_decay_list               none
> policy_hierarchy                  OFS
> weight_ticket                     0.010000
> weight_waiting_time               0.000000
> weight_deadline                   3600000.000000
> weight_urgency                    0.100000
> weight_priority                   1.000000
> max_reservation                   0
> default_duration                  INFINITY
>
>
> # qconf -srqs
> {
>    name         host_slot_limit
>    description  Limit total number of slots per hosts (assume uniform
> machines)
>    enabled      TRUE
>    limit        hosts {@allhosts} to slots=2
> }
> {
>    name         max_u_jobs
>    description  max jobs per user
>    enabled      TRUE
>    limit        users {*} to slots=256
> }
>
>
> # for i in `qconf -sql`; do echo " ";echo Qname: $i; qconf -sq $i; done
>
> Qname: all.q
> qname                 all.q
> hostlist              @nohosts
> seq_no                0
> load_thresholds       np_load_avg=1.75
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list             NONE
> pe_list               make
> rerun                 FALSE
> slots                 2
> tmpdir                /tmp
> shell                 /bin/csh
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            NONE
> user_lists            NONE
> xuser_lists           NONE
> subordinate_list      NONE
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  INFINITY
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
>
> Qname: normal
> qname                 normal
> hostlist              @allhosts
> seq_no                0
> load_thresholds       np_load_avg=1.75
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list             NONE
> pe_list               make
> rerun                 FALSE
> slots                 2
> tmpdir                /tmp
> shell                 /bin/csh
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            NONE
> user_lists            NONE
> xuser_lists           NONE
> subordinate_list      NONE
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  INFINITY
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
>
> Qname: pmatlab
> qname                 pmatlab
> hostlist              @allhosts
> seq_no                0
> load_thresholds       np_load_avg=1.75
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list             NONE
> pe_list               make
> rerun                 FALSE
> slots                 2
> tmpdir                /tmp
> shell                 /bin/csh
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            NONE
> user_lists            NONE
> xuser_lists           NONE
> subordinate_list      NONE
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  INFINITY
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
>
>
> Currently no hosts are assigned to neither of host groups since none of
> hosts are not being used by SGE adapter service:
>
> # qconf -shgrp @allhosts
> group_name @allhosts
> hostlist NONE
>
> # qconf -shgrp @nohosts
> group_name @nohosts
> hostlist NONE
>
> I also turned on the exclusive mode:
>
> # qconf -sc
> #name               shortcut   type        relop   requestable consumable
> default  urgency
> #-------------------------------------------------------------------------
> -----------------
> ...
> exclusive           excl       BOOL        EXCL    YES         YES
> 0        1000
>
>
> Thanks,
> - Chansup
>
>
>
>
> > M.
> >
> >
> > cbyun wrote:
> > > Michal,
> > >
> > > Yes, I made a few changes in the SGE configuration.
> > >
> > > I added a couple of RQS rules, a couple of new cluster queues and a
> new
> > hostgroup, @nohosts.
> > >
> > > Then, in order to make all.q from being used, I assigned @nohosts
> group
> > to the all.q.
> > >
> > > I believe the issue appeared after these customizations.
> > >
> > > Thanks,
> > > - Chansup
> > >
> > >
> > >
> > >> -----Original Message-----
> > >> From: Michal.Bachorik at sun.com [mailto:Michal.Bachorik at sun.com]
> > >> Sent: Monday, August 24, 2009 3:28 AM
> > >> To: users at gridengine.sunsource.net
> > >> Subject: Re: [GE users] SDM GE adapter sve got in trouble
> > >>
> > >> Chansup,
> > >>
> > >> did not your SGE changed in any way? There is error coming from jgdi
> > >> (from SGE side). Did not you do some kind of "upgrade/downgrade" of
> > >> jgdi.jar? I have not seen such error before, so I will need to dig in
> > it
> > >> - I will let you know once I found something.
> > >>
> > >> Regards,
> > >>
> > >> Michal
> > >>
> > >> cbyun wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> Somehow my ge adapter service got in trouble and I couldn't start it
> > any
> > >>>
> > >> more:
> > >>
> > >>> # sdmadm suc -c gesvc2
> > >>> comp   host            message
> > >>> ----------------------------------------
> > >>> gesvc2 llgriddev.local startup triggered
> > >>>
> > >>>
> > >>> 08/21/2009
> > >>>
> > >> 16:25:11|20|e.impl.ge.GEServiceAdapterImpl.doStartService|I|Service
> > >> gesvc2: Starting Grid Engine service
> > >>
> > >>> 08/21/2009
> > >>>
> > >> 16:25:11|20|rm.service.impl.AbstractServiceAdapter$1.call|E|Service
> > >> startup failed: jgdi error: java.lang.IllegalStateException: content
> > field
> > >> STU_name not found in descriptor
> > >>
> > >> |set_object_attribute: set_list of property reportVariables failed
> > >>
> > >>>
> > |
> > >>> 08/21/2009
> > >>>
> > >>
> >
> 16:25:11|21|rm.impl.AbstractComponent$3.performTransition|W|Componentgesvc
> > >> 2: Error in startup procedure: Service gesvc2: Unexpected error in
> > state
> > >> transition UnknownStateHandler[UNKNOWN] ->
> > StartingStateHandler[STARTING]:
> > >> Service startup failed: jgdi error: java.lang.IllegalStateException:
> > >> content field STU_name not found in descriptor
> > >>
> > >> |set_object_attribute: set_list of property reportVariables failed
> > >>
> > >>>
> > |
> > >>>
> > >>> Is there any way to clear up this error?
> > >>>
> > >>> Thanks,
> > >>> - Chansup
> > >>>
> > >>> ------------------------------------------------------
> > >>>
> > >>>
> > >>
> >
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
> > >> =213536
> > >>
> > >>> To unsubscribe from this discussion, e-mail: [users-
> > >>>
> > >> unsubscribe at gridengine.sunsource.net].
> > >>
> > >> ------------------------------------------------------
> > >>
> >
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
> > >> =213890
> > >>
> > >> To unsubscribe from this discussion, e-mail: [users-
> > >> unsubscribe at gridengine.sunsource.net].
> > >>
> > >
> > > ------------------------------------------------------
> > >
> >
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
> > =213947
> > >
> > > To unsubscribe from this discussion, e-mail: [users-
> > unsubscribe at gridengine.sunsource.net].
> > >
> >
> > ------------------------------------------------------
> >
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
> > =213956
> >
> > To unsubscribe from this discussion, e-mail: [users-
> > unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
> =213969
>
> To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=213992

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list