[GE users] SDM GE adapter sve got in trouble

cbyun cbyun at ll.mit.edu
Mon Aug 24 19:53:54 BST 2009


I got some more information.

When I started the GE adapter service, JGDI log show the following access denied error.

# sdmadm suc -c gesvc
comp  host            message
---------------------------------------
gesvc llgriddev.local startup triggered

# tail -f default/spool/qmaster/jgdi0.log
...
24/08/2009 14:46:54|11|jgdi.jni.EventClientImpl.close|W|Close of event client failed
                              java.security.AccessControlException: access denied (java.lang.RuntimePermission modifyThread)
                                java.security.AccessControlContext.checkPermission(AccessControlContext.java:323)
                                java.security.AccessController.checkPermission(AccessController.java:546)
                                java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
                                java.util.concurrent.ThreadPoolExecutor.shutdown(ThreadPoolExecutor.java:1094)
                                java.util.concurrent.Executors$DelegatedExecutorService.shutdown(Executors.java:591)
                                com.sun.grid.jgdi.jni.EventClientImpl.close(EventClientImpl.java:157)
                                com.sun.grid.jgdi.management.NotificationBridge.close(NotificationBridge.java:182)
                                com.sun.grid.jgdi.management.JGDISession.close(JGDISession.java:139)
                                com.sun.grid.jgdi.management.JGDISession.closeSession(JGDISession.java:219)
                                com.sun.grid.jgdi.management.JGDIAgent$MyNotificationListener.handleNotification(JGDIAgent.java:403)

Thanks,
- Chansup


> -----Original Message-----
> From: Ryszard.Macidlowski at sun.com [mailto:Ryszard.Macidlowski at sun.com]
> Sent: Monday, August 24, 2009 1:09 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] SDM GE adapter sve got in trouble
>
> Hi Chansup,
>
>  From the error that I see, I dont think that it's SDM adapter problem.
> What adapter does during startup it connects using jgdi to the qmaster
> (jmx thread) and retrieves data (using this jgdi connection). What I can
> see in your stacktrace is that you get IllegalStateException from jgdi
> (as long as jgdi throws this exception service will not start and
> nothing can be done on SDM side. You didnt make any modifications in
> ge_adapter_svc_config.xml I suppose. So either there is a problem with
> qmaster (try to restart qmaster and see if you are able to start gesvc
>  >> I suppose you already tried this and this is not a problem) or there
> is problem/possible bug in jgdi. You possibly customized the SGE that
> way that jgdi cannot report/process customized values.
>
> So to clear the error I supposed you could "undo" your customizations. I
> would suggest to do it step by step (after each step try to start SDM
> gesvc service to see if it starts and track the problematic change).
>
> The second approach would be to use install new SGE cell with default
> setup and add customizations step by step (after each step try to start
> SDM gesvc service to see if it starts and track the problematic change).
>
> Of course the third approach would be to debug the jgdi :)
>
> BTW. Have you checked jgdi logs if there is any information
>
> Rys
>
> cbyun pisze:
> > Hi Michal,
> >
> > I tried the following steps but it still failed to clear the issue.
> > Any suggestion to clear the issue?
> >
> > - Remove GE adapter service
> >         sdmadm rs -s gesvc -force
> >
> > - Shutdonw/Startup SDM master service
> >         sdmadm sdj -all -h localhost
> >         sdmadm suj
> >
> > - Add GE adapter service
> >         sdmadm ags -h localhost -j cs_vm -s gesvc -f \
> >                   <path>/ge_adapter_svc_config.xml
> >
> > - Startup GE component
> >         sdmadm suc -c gesvc -h localhost
> >
> > I'm still getting the same error:
> >
> > # 08/24/2009
> 12:20:08|16|e.impl.ge.GEServiceAdapterImpl.doStartService|I|Service gesvc:
> Starting Grid Engine service
> > 08/24/2009
> 12:20:08|16|rm.service.impl.AbstractServiceAdapter$1.call|E|Service
> startup failed: jgdi error: java.lang.IllegalStateException: content field
> STU_name not found in descriptor
> >
> |set_object_attribute: set_list of property reportVariables failed
> >                                                                       |
> > 08/24/2009
> 12:20:08|17|rm.impl.AbstractComponent$3.performTransition|W|Componentgesvc
> : Error in startup procedure: Service gesvc: Unexpected error in state
> transition UnknownStateHandler[UNKNOWN] -> StartingStateHandler[STARTING]:
> Service startup failed: jgdi error: java.lang.IllegalStateException:
> content field STU_name not found in descriptor
> >
> |set_object_attribute: set_list of property reportVariables failed
> >                                                                       |
> >
> >
> > Thanks,
> > - Chansup
> >
> >
> >
> >> -----Original Message-----
> >> From: cbyun [mailto:cbyun at ll.mit.edu]
> >> Sent: Monday, August 24, 2009 9:51 AM
> >> To: users at gridengine.sunsource.net
> >> Subject: RE: [GE users] SDM GE adapter sve got in trouble
> >>
> >> Michal,
> >>
> >>
> >>> -----Original Message-----
> >>> From: Michal.Bachorik at sun.com [mailto:Michal.Bachorik at sun.com]
> >>> Sent: Monday, August 24, 2009 9:23 AM
> >>> To: users at gridengine.sunsource.net
> >>> Subject: Re: [GE users] SDM GE adapter sve got in trouble
> >>>
> >>> Chansup,
> >>>
> >>> actually is something called "STU_name" part of your changes? The
> error
> >>> says something about "STU_name" not being part of descriptor, so I'd
> >>> like to know whether it is something you introduced or touched or
> not ..
> >>>
> >>>
> >> No, I have no such a thing called as "STU_name" in my configuration.
> >> Here is my configuration:
> >>
> >> # qconf -sconf
> >> #global:
> >> execd_spool_dir              /var/spool/sge
> >> mailer                       /bin/mail
> >> xterm                        /usr/bin/X11/xterm
> >> load_sensor                  none
> >> prolog                       none
> >> epilog                       none
> >> shell_start_mode             posix_compliant
> >> login_shells                 sh,ksh,csh,tcsh
> >> min_uid                      0
> >> min_gid                      0
> >> user_lists                   none
> >> xuser_lists                  none
> >> projects                     none
> >> xprojects                    none
> >> enforce_project              false
> >> enforce_user                 auto
> >> load_report_time             00:00:40
> >> max_unheard                  00:05:00
> >> reschedule_unknown           00:00:00
> >> loglevel                     log_warning
> >> administrator_mail           none
> >> set_token_cmd                none
> >> pag_cmd                      none
> >> token_extend_time            none
> >> shepherd_cmd                 none
> >> qmaster_params               none
> >> execd_params                 none
> >> reporting_params             accounting=true reporting=true \
> >>                              flush_time=00:00:15 joblog=true
> >> sharelog=00:00:00
> >> finished_jobs                100
> >> gid_range                    20000-20100
> >> qlogin_command               builtin
> >> qlogin_daemon                builtin
> >> rlogin_command               builtin
> >> rlogin_daemon                builtin
> >> rsh_command                  builtin
> >> rsh_daemon                   builtin
> >> max_aj_instances             2000
> >> max_aj_tasks                 75000
> >> max_u_jobs                   0
> >> max_jobs                     0
> >> max_advance_reservations     0
> >> auto_user_oticket            0
> >> auto_user_fshare             0
> >> auto_user_default_project    none
> >> auto_user_delete_time        86400
> >> delegated_file_staging       false
> >> reprioritize                 0
> >> jsv_url                      none
> >> libjvm_path
> >> /usr/java/latest/jre/lib/amd64/server/libjvm.so
> >> additional_jvm_args          -Xmx256m
> >> jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w
> >>
> >>
> >> # qconf -ssconf
> >> algorithm                         default
> >> schedule_interval                 0:2:0
> >> maxujobs                          0
> >> queue_sort_method                 load
> >> job_load_adjustments              NONE
> >> load_adjustment_decay_time        0:0:0
> >> load_formula                      np_load_avg
> >> schedd_job_info                   true
> >> flush_submit_sec                  2
> >> flush_finish_sec                  2
> >> params                            none
> >> reprioritize_interval             0:0:0
> >> halftime                          168
> >> usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
> >> compensation_factor               5.000000
> >> weight_user                       0.250000
> >> weight_project                    0.250000
> >> weight_department                 0.250000
> >> weight_job                        0.250000
> >> weight_tickets_functional         0
> >> weight_tickets_share              0
> >> share_override_tickets            TRUE
> >> share_functional_shares           TRUE
> >> max_functional_jobs_to_schedule   200
> >> report_pjob_tickets               FALSE
> >> max_pending_tasks_per_job         50
> >> halflife_decay_list               none
> >> policy_hierarchy                  OFS
> >> weight_ticket                     0.010000
> >> weight_waiting_time               0.000000
> >> weight_deadline                   3600000.000000
> >> weight_urgency                    0.100000
> >> weight_priority                   1.000000
> >> max_reservation                   0
> >> default_duration                  INFINITY
> >>
> >>
> >> # qconf -srqs
> >> {
> >>    name         host_slot_limit
> >>    description  Limit total number of slots per hosts (assume uniform
> >> machines)
> >>    enabled      TRUE
> >>    limit        hosts {@allhosts} to slots=2
> >> }
> >> {
> >>    name         max_u_jobs
> >>    description  max jobs per user
> >>    enabled      TRUE
> >>    limit        users {*} to slots=256
> >> }
> >>
> >>
> >> # for i in `qconf -sql`; do echo " ";echo Qname: $i; qconf -sq $i; done
> >>
> >> Qname: all.q
> >> qname                 all.q
> >> hostlist              @nohosts
> >> seq_no                0
> >> load_thresholds       np_load_avg=1.75
> >> suspend_thresholds    NONE
> >> nsuspend              1
> >> suspend_interval      00:05:00
> >> priority              0
> >> min_cpu_interval      00:05:00
> >> processors            UNDEFINED
> >> qtype                 BATCH INTERACTIVE
> >> ckpt_list             NONE
> >> pe_list               make
> >> rerun                 FALSE
> >> slots                 2
> >> tmpdir                /tmp
> >> shell                 /bin/csh
> >> prolog                NONE
> >> epilog                NONE
> >> shell_start_mode      posix_compliant
> >> starter_method        NONE
> >> suspend_method        NONE
> >> resume_method         NONE
> >> terminate_method      NONE
> >> notify                00:00:60
> >> owner_list            NONE
> >> user_lists            NONE
> >> xuser_lists           NONE
> >> subordinate_list      NONE
> >> complex_values        NONE
> >> projects              NONE
> >> xprojects             NONE
> >> calendar              NONE
> >> initial_state         default
> >> s_rt                  INFINITY
> >> h_rt                  INFINITY
> >> s_cpu                 INFINITY
> >> h_cpu                 INFINITY
> >> s_fsize               INFINITY
> >> h_fsize               INFINITY
> >> s_data                INFINITY
> >> h_data                INFINITY
> >> s_stack               INFINITY
> >> h_stack               INFINITY
> >> s_core                INFINITY
> >> h_core                INFINITY
> >> s_rss                 INFINITY
> >> h_rss                 INFINITY
> >> s_vmem                INFINITY
> >> h_vmem                INFINITY
> >>
> >> Qname: normal
> >> qname                 normal
> >> hostlist              @allhosts
> >> seq_no                0
> >> load_thresholds       np_load_avg=1.75
> >> suspend_thresholds    NONE
> >> nsuspend              1
> >> suspend_interval      00:05:00
> >> priority              0
> >> min_cpu_interval      00:05:00
> >> processors            UNDEFINED
> >> qtype                 BATCH INTERACTIVE
> >> ckpt_list             NONE
> >> pe_list               make
> >> rerun                 FALSE
> >> slots                 2
> >> tmpdir                /tmp
> >> shell                 /bin/csh
> >> prolog                NONE
> >> epilog                NONE
> >> shell_start_mode      posix_compliant
> >> starter_method        NONE
> >> suspend_method        NONE
> >> resume_method         NONE
> >> terminate_method      NONE
> >> notify                00:00:60
> >> owner_list            NONE
> >> user_lists            NONE
> >> xuser_lists           NONE
> >> subordinate_list      NONE
> >> complex_values        NONE
> >> projects              NONE
> >> xprojects             NONE
> >> calendar              NONE
> >> initial_state         default
> >> s_rt                  INFINITY
> >> h_rt                  INFINITY
> >> s_cpu                 INFINITY
> >> h_cpu                 INFINITY
> >> s_fsize               INFINITY
> >> h_fsize               INFINITY
> >> s_data                INFINITY
> >> h_data                INFINITY
> >> s_stack               INFINITY
> >> h_stack               INFINITY
> >> s_core                INFINITY
> >> h_core                INFINITY
> >> s_rss                 INFINITY
> >> h_rss                 INFINITY
> >> s_vmem                INFINITY
> >> h_vmem                INFINITY
> >>
> >> Qname: pmatlab
> >> qname                 pmatlab
> >> hostlist              @allhosts
> >> seq_no                0
> >> load_thresholds       np_load_avg=1.75
> >> suspend_thresholds    NONE
> >> nsuspend              1
> >> suspend_interval      00:05:00
> >> priority              0
> >> min_cpu_interval      00:05:00
> >> processors            UNDEFINED
> >> qtype                 BATCH INTERACTIVE
> >> ckpt_list             NONE
> >> pe_list               make
> >> rerun                 FALSE
> >> slots                 2
> >> tmpdir                /tmp
> >> shell                 /bin/csh
> >> prolog                NONE
> >> epilog                NONE
> >> shell_start_mode      posix_compliant
> >> starter_method        NONE
> >> suspend_method        NONE
> >> resume_method         NONE
> >> terminate_method      NONE
> >> notify                00:00:60
> >> owner_list            NONE
> >> user_lists            NONE
> >> xuser_lists           NONE
> >> subordinate_list      NONE
> >> complex_values        NONE
> >> projects              NONE
> >> xprojects             NONE
> >> calendar              NONE
> >> initial_state         default
> >> s_rt                  INFINITY
> >> h_rt                  INFINITY
> >> s_cpu                 INFINITY
> >> h_cpu                 INFINITY
> >> s_fsize               INFINITY
> >> h_fsize               INFINITY
> >> s_data                INFINITY
> >> h_data                INFINITY
> >> s_stack               INFINITY
> >> h_stack               INFINITY
> >> s_core                INFINITY
> >> h_core                INFINITY
> >> s_rss                 INFINITY
> >> h_rss                 INFINITY
> >> s_vmem                INFINITY
> >> h_vmem                INFINITY
> >>
> >>
> >> Currently no hosts are assigned to neither of host groups since none of
> >> hosts are not being used by SGE adapter service:
> >>
> >> # qconf -shgrp @allhosts
> >> group_name @allhosts
> >> hostlist NONE
> >>
> >> # qconf -shgrp @nohosts
> >> group_name @nohosts
> >> hostlist NONE
> >>
> >> I also turned on the exclusive mode:
> >>
> >> # qconf -sc
> >> #name               shortcut   type        relop   requestable
> consumable
> >> default  urgency
> >> #----------------------------------------------------------------------
> ---
> >> -----------------
> >> ...
> >> exclusive           excl       BOOL        EXCL    YES         YES
> >> 0        1000
> >>
> >>
> >> Thanks,
> >> - Chansup
> >>
> >>
> >>
> >>
> >>
> >>> M.
> >>>
> >>>
> >>> cbyun wrote:
> >>>
> >>>> Michal,
> >>>>
> >>>> Yes, I made a few changes in the SGE configuration.
> >>>>
> >>>> I added a couple of RQS rules, a couple of new cluster queues and a
> >>>>
> >> new
> >>
> >>> hostgroup, @nohosts.
> >>>
> >>>> Then, in order to make all.q from being used, I assigned @nohosts
> >>>>
> >> group
> >>
> >>> to the all.q.
> >>>
> >>>> I believe the issue appeared after these customizations.
> >>>>
> >>>> Thanks,
> >>>> - Chansup
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Michal.Bachorik at sun.com [mailto:Michal.Bachorik at sun.com]
> >>>>> Sent: Monday, August 24, 2009 3:28 AM
> >>>>> To: users at gridengine.sunsource.net
> >>>>> Subject: Re: [GE users] SDM GE adapter sve got in trouble
> >>>>>
> >>>>> Chansup,
> >>>>>
> >>>>> did not your SGE changed in any way? There is error coming from jgdi
> >>>>> (from SGE side). Did not you do some kind of "upgrade/downgrade" of
> >>>>> jgdi.jar? I have not seen such error before, so I will need to dig
> in
> >>>>>
> >>> it
> >>>
> >>>>> - I will let you know once I found something.
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> Michal
> >>>>>
> >>>>> cbyun wrote:
> >>>>>
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Somehow my ge adapter service got in trouble and I couldn't start
> it
> >>>>>>
> >>> any
> >>>
> >>>>> more:
> >>>>>
> >>>>>
> >>>>>> # sdmadm suc -c gesvc2
> >>>>>> comp   host            message
> >>>>>> ----------------------------------------
> >>>>>> gesvc2 llgriddev.local startup triggered
> >>>>>>
> >>>>>>
> >>>>>> 08/21/2009
> >>>>>>
> >>>>>>
> >>>>> 16:25:11|20|e.impl.ge.GEServiceAdapterImpl.doStartService|I|Service
> >>>>> gesvc2: Starting Grid Engine service
> >>>>>
> >>>>>
> >>>>>> 08/21/2009
> >>>>>>
> >>>>>>
> >>>>> 16:25:11|20|rm.service.impl.AbstractServiceAdapter$1.call|E|Service
> >>>>> startup failed: jgdi error: java.lang.IllegalStateException: content
> >>>>>
> >>> field
> >>>
> >>>>> STU_name not found in descriptor
> >>>>>
> >>>>> |set_object_attribute: set_list of property reportVariables failed
> >>>>>
> >>>>>
> >>> |
> >>>
> >>>>>> 08/21/2009
> >>>>>>
> >>>>>>
> >>
> 16:25:11|21|rm.impl.AbstractComponent$3.performTransition|W|Componentgesvc
> >>
> >>>>> 2: Error in startup procedure: Service gesvc2: Unexpected error in
> >>>>>
> >>> state
> >>>
> >>>>> transition UnknownStateHandler[UNKNOWN] ->
> >>>>>
> >>> StartingStateHandler[STARTING]:
> >>>
> >>>>> Service startup failed: jgdi error: java.lang.IllegalStateException:
> >>>>> content field STU_name not found in descriptor
> >>>>>
> >>>>> |set_object_attribute: set_list of property reportVariables failed
> >>>>>
> >>>>>
> >>> |
> >>>
> >>>>>> Is there any way to clear up this error?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> - Chansup
> >>>>>>
> >>>>>> ------------------------------------------------------
> >>>>>>
> >>>>>>
> >>>>>>
> >>
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
> >>
> >>>>> =213536
> >>>>>
> >>>>>
> >>>>>> To unsubscribe from this discussion, e-mail: [users-
> >>>>>>
> >>>>>>
> >>>>> unsubscribe at gridengine.sunsource.net].
> >>>>>
> >>>>> ------------------------------------------------------
> >>>>>
> >>>>>
> >>
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
> >>
> >>>>> =213890
> >>>>>
> >>>>> To unsubscribe from this discussion, e-mail: [users-
> >>>>> unsubscribe at gridengine.sunsource.net].
> >>>>>
> >>>>>
> >>>> ------------------------------------------------------
> >>>>
> >>>>
> >>
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
> >>
> >>> =213947
> >>>
> >>>> To unsubscribe from this discussion, e-mail: [users-
> >>>>
> >>> unsubscribe at gridengine.sunsource.net].
> >>>
> >>> ------------------------------------------------------
> >>>
> >>>
> >>
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
> >>
> >>> =213956
> >>>
> >>> To unsubscribe from this discussion, e-mail: [users-
> >>> unsubscribe at gridengine.sunsource.net].
> >>>
> >> ------------------------------------------------------
> >>
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
> >> =213969
> >>
> >> To unsubscribe from this discussion, e-mail: [users-
> >> unsubscribe at gridengine.sunsource.net].
> >>
> >
> > ------------------------------------------------------
> >
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
> =213992
> >
> > To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].
> >
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
> =213997
>
> To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214028

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list