[GE users] SDM GE adapter sve got in trouble

easymf michal.bachorik at sun.com
Tue Aug 25 15:44:10 BST 2009


Chansup,

thx for investigating. Of course, go ahead and file an issue, please.

Regards,

Michal


cbyun wrote:
> I think I found what configuration change had caused the issue.
>
> If I add the following reporting_varialbes for ARCo, the qmaster jgdi interface complained:
>
> # qconf -me global
> ...
> report_variables      cpu,np_load_avg,mem_free,virtual_free
>
> Then,
>
> 25/08/2009 11:04:26|17|jni.EventClientImpl.fillEvents|W|content field STU_name not found in descriptor
> set_object_attribute: set_list of property reportVariables failed
>
> I think this is a serious bug that prevents SGE GE adapter service from starting.  Any comments?  Should I file an issue?
>
> Also, it is a minor issue but if I shut down SDM master process, qmaster JGDI log shows the following issue:
>
> 25/08/2009 10:58:24|16|jgdi.jni.EventClientImpl.close|W|Close of event client failed
>                               java.security.AccessControlException: access denied (java.lang.RuntimePermission modifyThread)
>                                 java.security.AccessControlContext.checkPermission(AccessControlContext.java:323)
>                                 java.security.AccessController.checkPermission(AccessController.java:546)
>                                 java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
>                                 java.util.concurrent.ThreadPoolExecutor.shutdown(ThreadPoolExecutor.java:1094)
>                                 java.util.concurrent.Executors$DelegatedExecutorService.shutdown(Executors.java:591)
>                                 com.sun.grid.jgdi.jni.EventClientImpl.close(EventClientImpl.java:157)
>                                 com.sun.grid.jgdi.management.NotificationBridge.close(NotificationBridge.java:182)
>                                 com.sun.grid.jgdi.management.JGDISession.close(JGDISession.java:139)
>                                 com.sun.grid.jgdi.management.JGDISession.closeSession(JGDISession.java:219)
>                                 com.sun.grid.jgdi.management.JGDIAgent$MyNotificationListener.handleNotification(JGDIAgent.java:403)
>
>
>
>> -----Original Message-----
>> From: cbyun [mailto:cbyun at ll.mit.edu]
>> Sent: Tuesday, August 25, 2009 9:41 AM
>> To: users at gridengine.sunsource.net
>> Subject: RE: [GE users] SDM GE adapter sve got in trouble
>>
>> I think I removed all the SGE customizations but I'm still getting the
>> same error when I start SDM master daemon.
>>
>> According to my strace output for the qmaster process, there was a
>> segmentation violation error.  Would this indicate an issue with JGDI
>> interface?
>>
>>
>> # sdmadm sdj -all -h localhost
>> jvm   host            result  message
>> -------------------------------------
>> cs_vm llgriddev.local STOPPED
>>
>> # sdmadm suj
>> jvm   host            result  message
>> ------------------------------------------------------------
>> cs_vm llgriddev.local STARTED
>>
>>
>> From default/spool/qmaster/jgdi0.log:
>>
>> 25/08/2009 09:24:24|13|jgdi.jni.EventClientImpl.close|W|Close of event
>> client failed
>>                               java.security.AccessControlException: access
>> denied (java.lang.RuntimePermission modifyThread)
>>
>> java.security.AccessControlContext.checkPermission(AccessControlContext.ja
>> va:323)
>>
>> java.security.AccessController.checkPermission(AccessController.java:546)
>>
>> java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
>>
>> java.util.concurrent.ThreadPoolExecutor.shutdown(ThreadPoolExecutor.java:1
>> 094)
>>
>> java.util.concurrent.Executors$DelegatedExecutorService.shutdown(Executors
>> .java:591)
>>
>> com.sun.grid.jgdi.jni.EventClientImpl.close(EventClientImpl.java:157)
>>
>> com.sun.grid.jgdi.management.NotificationBridge.close(NotificationBridge.j
>> ava:182)
>>
>> com.sun.grid.jgdi.management.JGDISession.close(JGDISession.java:139)
>>
>> com.sun.grid.jgdi.management.JGDISession.closeSession(JGDISession.java:219
>> )
>>
>> com.sun.grid.jgdi.management.JGDIAgent$MyNotificationListener.handleNotifi
>> cation(JGDIAgent.java:403)
>>
>>
>> From strace output for the qmaster process:
>> 09:24:24 clock_gettime(CLOCK_MONOTONIC, {4917022, 451365236}) = 0
>> <0.000011>
>> 09:24:24 mprotect(0x2aaaab81b000, 4096, PROT_READ) = 0 <0.000019>
>> 09:24:24 mprotect(0x2aaaab81b000, 4096,
>> PROT_READ|PROT_WRITE|PROT_EXEC09:24:24 --- SIGSEGV (Segmentation fault) @
>> 0 (0) ---
>> ) = 0 <0.000102>
>> 09:24:24 futex(0x2aaac4537a14, 0x80 /* FUTEX_??? */, 8509:24:24
>> futex(0x2aaac4537a14, 0x85 /* FUTEX_??? */, 1) = -1 EAGAIN (Resource
>> temporarily unavailable) <0.000042>
>> ) = 0 <0.000031>
>> 09:24:24 futex(0x2b7e22dd3428, 0x80 /* FUTEX_??? */, 209:24:24
>> futex(0x2b7e22dd3428, 0x81 /* FUTEX_??? */, 1) = -1 EAGAIN (Resource
>> temporarily unavailable) <0.000041>
>> ) = 0 <0.000028>
>> 09:24:24 futex(0x2b7e22dd3428, 0x81 /* FUTEX_??? */, 109:24:24
>> mprotect(0x2aaaab81a000, 4096, PROT_NONE) = 0 <0.000042>
>> ) = 0 <0.000028>
>> 09:24:24 rt_sigreturn(0)                = 46912499900416 <0.000013>
>>
>> Thanks,
>> - Chansup
>>
>>
>>
>>
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: cbyun [mailto:cbyun at ll.mit.edu]
>>> Sent: Monday, August 24, 2009 2:54 PM
>>> To: users at gridengine.sunsource.net
>>> Subject: RE: [GE users] SDM GE adapter sve got in trouble
>>>
>>> I got some more information.
>>>
>>> When I started the GE adapter service, JGDI log show the following
>>>
>> access
>>
>>> denied error.
>>>
>>> # sdmadm suc -c gesvc
>>> comp  host            message
>>> ---------------------------------------
>>> gesvc llgriddev.local startup triggered
>>>
>>> # tail -f default/spool/qmaster/jgdi0.log
>>> ...
>>> 24/08/2009 14:46:54|11|jgdi.jni.EventClientImpl.close|W|Close of event
>>> client failed
>>>                               java.security.AccessControlException:
>>>
>> access
>>
>>> denied (java.lang.RuntimePermission modifyThread)
>>>
>>>
>>>
>> java.security.AccessControlContext.checkPermission(AccessControlContext.ja
>>
>>> va:323)
>>>
>>>
>>>
>> java.security.AccessController.checkPermission(AccessController.java:546)
>>
>>> java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
>>>
>>>
>>>
>> java.util.concurrent.ThreadPoolExecutor.shutdown(ThreadPoolExecutor.java:1
>>
>>> 094)
>>>
>>>
>>>
>> java.util.concurrent.Executors$DelegatedExecutorService.shutdown(Executors
>>
>>> .java:591)
>>>
>>> com.sun.grid.jgdi.jni.EventClientImpl.close(EventClientImpl.java:157)
>>>
>>>
>>>
>> com.sun.grid.jgdi.management.NotificationBridge.close(NotificationBridge.j
>>
>>> ava:182)
>>>
>>> com.sun.grid.jgdi.management.JGDISession.close(JGDISession.java:139)
>>>
>>>
>>>
>> com.sun.grid.jgdi.management.JGDISession.closeSession(JGDISession.java:219
>>
>>> )
>>>
>>>
>>>
>> com.sun.grid.jgdi.management.JGDIAgent$MyNotificationListener.handleNotifi
>>
>>> cation(JGDIAgent.java:403)
>>>
>>> Thanks,
>>> - Chansup
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ryszard.Macidlowski at sun.com [mailto:Ryszard.Macidlowski at sun.com]
>>>> Sent: Monday, August 24, 2009 1:09 PM
>>>> To: users at gridengine.sunsource.net
>>>> Subject: Re: [GE users] SDM GE adapter sve got in trouble
>>>>
>>>> Hi Chansup,
>>>>
>>>>  From the error that I see, I dont think that it's SDM adapter problem.
>>>> What adapter does during startup it connects using jgdi to the qmaster
>>>> (jmx thread) and retrieves data (using this jgdi connection). What I
>>>>
>> can
>>
>>>> see in your stacktrace is that you get IllegalStateException from jgdi
>>>> (as long as jgdi throws this exception service will not start and
>>>> nothing can be done on SDM side. You didnt make any modifications in
>>>> ge_adapter_svc_config.xml I suppose. So either there is a problem with
>>>> qmaster (try to restart qmaster and see if you are able to start gesvc
>>>>  >> I suppose you already tried this and this is not a problem) or
>>>>
>> there
>>
>>>> is problem/possible bug in jgdi. You possibly customized the SGE that
>>>> way that jgdi cannot report/process customized values.
>>>>
>>>> So to clear the error I supposed you could "undo" your customizations.
>>>>
>> I
>>
>>>> would suggest to do it step by step (after each step try to start SDM
>>>> gesvc service to see if it starts and track the problematic change).
>>>>
>>>> The second approach would be to use install new SGE cell with default
>>>> setup and add customizations step by step (after each step try to
>>>>
>> start
>>
>>>> SDM gesvc service to see if it starts and track the problematic
>>>>
>> change).
>>
>>>> Of course the third approach would be to debug the jgdi :)
>>>>
>>>> BTW. Have you checked jgdi logs if there is any information
>>>>
>>>> Rys
>>>>
>>>> cbyun pisze:
>>>>
>>>>> Hi Michal,
>>>>>
>>>>> I tried the following steps but it still failed to clear the issue.
>>>>> Any suggestion to clear the issue?
>>>>>
>>>>> - Remove GE adapter service
>>>>>         sdmadm rs -s gesvc -force
>>>>>
>>>>> - Shutdonw/Startup SDM master service
>>>>>         sdmadm sdj -all -h localhost
>>>>>         sdmadm suj
>>>>>
>>>>> - Add GE adapter service
>>>>>         sdmadm ags -h localhost -j cs_vm -s gesvc -f \
>>>>>                   <path>/ge_adapter_svc_config.xml
>>>>>
>>>>> - Startup GE component
>>>>>         sdmadm suc -c gesvc -h localhost
>>>>>
>>>>> I'm still getting the same error:
>>>>>
>>>>> # 08/24/2009
>>>>>
>>>> 12:20:08|16|e.impl.ge.GEServiceAdapterImpl.doStartService|I|Service
>>>>
>>> gesvc:
>>>
>>>> Starting Grid Engine service
>>>>
>>>>> 08/24/2009
>>>>>
>>>> 12:20:08|16|rm.service.impl.AbstractServiceAdapter$1.call|E|Service
>>>> startup failed: jgdi error: java.lang.IllegalStateException: content
>>>>
>>> field
>>>
>>>> STU_name not found in descriptor
>>>>
>>>> |set_object_attribute: set_list of property reportVariables failed
>>>>
>>> |
>>>
>>>>> 08/24/2009
>>>>>
>> 12:20:08|17|rm.impl.AbstractComponent$3.performTransition|W|Componentgesvc
>>
>>>> : Error in startup procedure: Service gesvc: Unexpected error in state
>>>> transition UnknownStateHandler[UNKNOWN] ->
>>>>
>>> StartingStateHandler[STARTING]:
>>>
>>>> Service startup failed: jgdi error: java.lang.IllegalStateException:
>>>> content field STU_name not found in descriptor
>>>>
>>>> |set_object_attribute: set_list of property reportVariables failed
>>>>
>>> |
>>>
>>>>> Thanks,
>>>>> - Chansup
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: cbyun [mailto:cbyun at ll.mit.edu]
>>>>>> Sent: Monday, August 24, 2009 9:51 AM
>>>>>> To: users at gridengine.sunsource.net
>>>>>> Subject: RE: [GE users] SDM GE adapter sve got in trouble
>>>>>>
>>>>>> Michal,
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Michal.Bachorik at sun.com [mailto:Michal.Bachorik at sun.com]
>>>>>>> Sent: Monday, August 24, 2009 9:23 AM
>>>>>>> To: users at gridengine.sunsource.net
>>>>>>> Subject: Re: [GE users] SDM GE adapter sve got in trouble
>>>>>>>
>>>>>>> Chansup,
>>>>>>>
>>>>>>> actually is something called "STU_name" part of your changes? The
>>>>>>>
>>>> error
>>>>
>>>>>>> says something about "STU_name" not being part of descriptor, so
>>>>>>>
>> I'd
>>
>>>>>>> like to know whether it is something you introduced or touched or
>>>>>>>
>>>> not ..
>>>>
>>>>>>>
>>>>>> No, I have no such a thing called as "STU_name" in my configuration.
>>>>>> Here is my configuration:
>>>>>>
>>>>>> # qconf -sconf
>>>>>> #global:
>>>>>> execd_spool_dir              /var/spool/sge
>>>>>> mailer                       /bin/mail
>>>>>> xterm                        /usr/bin/X11/xterm
>>>>>> load_sensor                  none
>>>>>> prolog                       none
>>>>>> epilog                       none
>>>>>> shell_start_mode             posix_compliant
>>>>>> login_shells                 sh,ksh,csh,tcsh
>>>>>> min_uid                      0
>>>>>> min_gid                      0
>>>>>> user_lists                   none
>>>>>> xuser_lists                  none
>>>>>> projects                     none
>>>>>> xprojects                    none
>>>>>> enforce_project              false
>>>>>> enforce_user                 auto
>>>>>> load_report_time             00:00:40
>>>>>> max_unheard                  00:05:00
>>>>>> reschedule_unknown           00:00:00
>>>>>> loglevel                     log_warning
>>>>>> administrator_mail           none
>>>>>> set_token_cmd                none
>>>>>> pag_cmd                      none
>>>>>> token_extend_time            none
>>>>>> shepherd_cmd                 none
>>>>>> qmaster_params               none
>>>>>> execd_params                 none
>>>>>> reporting_params             accounting=true reporting=true \
>>>>>>                              flush_time=00:00:15 joblog=true
>>>>>> sharelog=00:00:00
>>>>>> finished_jobs                100
>>>>>> gid_range                    20000-20100
>>>>>> qlogin_command               builtin
>>>>>> qlogin_daemon                builtin
>>>>>> rlogin_command               builtin
>>>>>> rlogin_daemon                builtin
>>>>>> rsh_command                  builtin
>>>>>> rsh_daemon                   builtin
>>>>>> max_aj_instances             2000
>>>>>> max_aj_tasks                 75000
>>>>>> max_u_jobs                   0
>>>>>> max_jobs                     0
>>>>>> max_advance_reservations     0
>>>>>> auto_user_oticket            0
>>>>>> auto_user_fshare             0
>>>>>> auto_user_default_project    none
>>>>>> auto_user_delete_time        86400
>>>>>> delegated_file_staging       false
>>>>>> reprioritize                 0
>>>>>> jsv_url                      none
>>>>>> libjvm_path
>>>>>> /usr/java/latest/jre/lib/amd64/server/libjvm.so
>>>>>> additional_jvm_args          -Xmx256m
>>>>>> jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w
>>>>>>
>>>>>>
>>>>>> # qconf -ssconf
>>>>>> algorithm                         default
>>>>>> schedule_interval                 0:2:0
>>>>>> maxujobs                          0
>>>>>> queue_sort_method                 load
>>>>>> job_load_adjustments              NONE
>>>>>> load_adjustment_decay_time        0:0:0
>>>>>> load_formula                      np_load_avg
>>>>>> schedd_job_info                   true
>>>>>> flush_submit_sec                  2
>>>>>> flush_finish_sec                  2
>>>>>> params                            none
>>>>>> reprioritize_interval             0:0:0
>>>>>> halftime                          168
>>>>>> usage_weight_list
>>>>>>
>>> cpu=1.000000,mem=0.000000,io=0.000000
>>>
>>>>>> compensation_factor               5.000000
>>>>>> weight_user                       0.250000
>>>>>> weight_project                    0.250000
>>>>>> weight_department                 0.250000
>>>>>> weight_job                        0.250000
>>>>>> weight_tickets_functional         0
>>>>>> weight_tickets_share              0
>>>>>> share_override_tickets            TRUE
>>>>>> share_functional_shares           TRUE
>>>>>> max_functional_jobs_to_schedule   200
>>>>>> report_pjob_tickets               FALSE
>>>>>> max_pending_tasks_per_job         50
>>>>>> halflife_decay_list               none
>>>>>> policy_hierarchy                  OFS
>>>>>> weight_ticket                     0.010000
>>>>>> weight_waiting_time               0.000000
>>>>>> weight_deadline                   3600000.000000
>>>>>> weight_urgency                    0.100000
>>>>>> weight_priority                   1.000000
>>>>>> max_reservation                   0
>>>>>> default_duration                  INFINITY
>>>>>>
>>>>>>
>>>>>> # qconf -srqs
>>>>>> {
>>>>>>    name         host_slot_limit
>>>>>>    description  Limit total number of slots per hosts (assume
>>>>>>
>> uniform
>>
>>>>>> machines)
>>>>>>    enabled      TRUE
>>>>>>    limit        hosts {@allhosts} to slots=2
>>>>>> }
>>>>>> {
>>>>>>    name         max_u_jobs
>>>>>>    description  max jobs per user
>>>>>>    enabled      TRUE
>>>>>>    limit        users {*} to slots=256
>>>>>> }
>>>>>>
>>>>>>
>>>>>> # for i in `qconf -sql`; do echo " ";echo Qname: $i; qconf -sq $i;
>>>>>>
>>> done
>>>
>>>>>> Qname: all.q
>>>>>> qname                 all.q
>>>>>> hostlist              @nohosts
>>>>>> seq_no                0
>>>>>> load_thresholds       np_load_avg=1.75
>>>>>> suspend_thresholds    NONE
>>>>>> nsuspend              1
>>>>>> suspend_interval      00:05:00
>>>>>> priority              0
>>>>>> min_cpu_interval      00:05:00
>>>>>> processors            UNDEFINED
>>>>>> qtype                 BATCH INTERACTIVE
>>>>>> ckpt_list             NONE
>>>>>> pe_list               make
>>>>>> rerun                 FALSE
>>>>>> slots                 2
>>>>>> tmpdir                /tmp
>>>>>> shell                 /bin/csh
>>>>>> prolog                NONE
>>>>>> epilog                NONE
>>>>>> shell_start_mode      posix_compliant
>>>>>> starter_method        NONE
>>>>>> suspend_method        NONE
>>>>>> resume_method         NONE
>>>>>> terminate_method      NONE
>>>>>> notify                00:00:60
>>>>>> owner_list            NONE
>>>>>> user_lists            NONE
>>>>>> xuser_lists           NONE
>>>>>> subordinate_list      NONE
>>>>>> complex_values        NONE
>>>>>> projects              NONE
>>>>>> xprojects             NONE
>>>>>> calendar              NONE
>>>>>> initial_state         default
>>>>>> s_rt                  INFINITY
>>>>>> h_rt                  INFINITY
>>>>>> s_cpu                 INFINITY
>>>>>> h_cpu                 INFINITY
>>>>>> s_fsize               INFINITY
>>>>>> h_fsize               INFINITY
>>>>>> s_data                INFINITY
>>>>>> h_data                INFINITY
>>>>>> s_stack               INFINITY
>>>>>> h_stack               INFINITY
>>>>>> s_core                INFINITY
>>>>>> h_core                INFINITY
>>>>>> s_rss                 INFINITY
>>>>>> h_rss                 INFINITY
>>>>>> s_vmem                INFINITY
>>>>>> h_vmem                INFINITY
>>>>>>
>>>>>> Qname: normal
>>>>>> qname                 normal
>>>>>> hostlist              @allhosts
>>>>>> seq_no                0
>>>>>> load_thresholds       np_load_avg=1.75
>>>>>> suspend_thresholds    NONE
>>>>>> nsuspend              1
>>>>>> suspend_interval      00:05:00
>>>>>> priority              0
>>>>>> min_cpu_interval      00:05:00
>>>>>> processors            UNDEFINED
>>>>>> qtype                 BATCH INTERACTIVE
>>>>>> ckpt_list             NONE
>>>>>> pe_list               make
>>>>>> rerun                 FALSE
>>>>>> slots                 2
>>>>>> tmpdir                /tmp
>>>>>> shell                 /bin/csh
>>>>>> prolog                NONE
>>>>>> epilog                NONE
>>>>>> shell_start_mode      posix_compliant
>>>>>> starter_method        NONE
>>>>>> suspend_method        NONE
>>>>>> resume_method         NONE
>>>>>> terminate_method      NONE
>>>>>> notify                00:00:60
>>>>>> owner_list            NONE
>>>>>> user_lists            NONE
>>>>>> xuser_lists           NONE
>>>>>> subordinate_list      NONE
>>>>>> complex_values        NONE
>>>>>> projects              NONE
>>>>>> xprojects             NONE
>>>>>> calendar              NONE
>>>>>> initial_state         default
>>>>>> s_rt                  INFINITY
>>>>>> h_rt                  INFINITY
>>>>>> s_cpu                 INFINITY
>>>>>> h_cpu                 INFINITY
>>>>>> s_fsize               INFINITY
>>>>>> h_fsize               INFINITY
>>>>>> s_data                INFINITY
>>>>>> h_data                INFINITY
>>>>>> s_stack               INFINITY
>>>>>> h_stack               INFINITY
>>>>>> s_core                INFINITY
>>>>>> h_core                INFINITY
>>>>>> s_rss                 INFINITY
>>>>>> h_rss                 INFINITY
>>>>>> s_vmem                INFINITY
>>>>>> h_vmem                INFINITY
>>>>>>
>>>>>> Qname: pmatlab
>>>>>> qname                 pmatlab
>>>>>> hostlist              @allhosts
>>>>>> seq_no                0
>>>>>> load_thresholds       np_load_avg=1.75
>>>>>> suspend_thresholds    NONE
>>>>>> nsuspend              1
>>>>>> suspend_interval      00:05:00
>>>>>> priority              0
>>>>>> min_cpu_interval      00:05:00
>>>>>> processors            UNDEFINED
>>>>>> qtype                 BATCH INTERACTIVE
>>>>>> ckpt_list             NONE
>>>>>> pe_list               make
>>>>>> rerun                 FALSE
>>>>>> slots                 2
>>>>>> tmpdir                /tmp
>>>>>> shell                 /bin/csh
>>>>>> prolog                NONE
>>>>>> epilog                NONE
>>>>>> shell_start_mode      posix_compliant
>>>>>> starter_method        NONE
>>>>>> suspend_method        NONE
>>>>>> resume_method         NONE
>>>>>> terminate_method      NONE
>>>>>> notify                00:00:60
>>>>>> owner_list            NONE
>>>>>> user_lists            NONE
>>>>>> xuser_lists           NONE
>>>>>> subordinate_list      NONE
>>>>>> complex_values        NONE
>>>>>> projects              NONE
>>>>>> xprojects             NONE
>>>>>> calendar              NONE
>>>>>> initial_state         default
>>>>>> s_rt                  INFINITY
>>>>>> h_rt                  INFINITY
>>>>>> s_cpu                 INFINITY
>>>>>> h_cpu                 INFINITY
>>>>>> s_fsize               INFINITY
>>>>>> h_fsize               INFINITY
>>>>>> s_data                INFINITY
>>>>>> h_data                INFINITY
>>>>>> s_stack               INFINITY
>>>>>> h_stack               INFINITY
>>>>>> s_core                INFINITY
>>>>>> h_core                INFINITY
>>>>>> s_rss                 INFINITY
>>>>>> h_rss                 INFINITY
>>>>>> s_vmem                INFINITY
>>>>>> h_vmem                INFINITY
>>>>>>
>>>>>>
>>>>>> Currently no hosts are assigned to neither of host groups since
>>>>>>
>> none
>>
>>> of
>>>
>>>>>> hosts are not being used by SGE adapter service:
>>>>>>
>>>>>> # qconf -shgrp @allhosts
>>>>>> group_name @allhosts
>>>>>> hostlist NONE
>>>>>>
>>>>>> # qconf -shgrp @nohosts
>>>>>> group_name @nohosts
>>>>>> hostlist NONE
>>>>>>
>>>>>> I also turned on the exclusive mode:
>>>>>>
>>>>>> # qconf -sc
>>>>>> #name               shortcut   type        relop   requestable
>>>>>>
>>>> consumable
>>>>
>>>>>> default  urgency
>>>>>> #------------------------------------------------------------------
>>>>>>
>> --
>>
>>> --
>>>
>>>> ---
>>>>
>>>>>> -----------------
>>>>>> ...
>>>>>> exclusive           excl       BOOL        EXCL    YES         YES
>>>>>> 0        1000
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> - Chansup
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> M.
>>>>>>>
>>>>>>>
>>>>>>> cbyun wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Michal,
>>>>>>>>
>>>>>>>> Yes, I made a few changes in the SGE configuration.
>>>>>>>>
>>>>>>>> I added a couple of RQS rules, a couple of new cluster queues and
>>>>>>>>
>> a
>>
>>>>>> new
>>>>>>
>>>>>>
>>>>>>> hostgroup, @nohosts.
>>>>>>>
>>>>>>>
>>>>>>>> Then, in order to make all.q from being used, I assigned @nohosts
>>>>>>>>
>>>>>>>>
>>>>>> group
>>>>>>
>>>>>>
>>>>>>> to the all.q.
>>>>>>>
>>>>>>>
>>>>>>>> I believe the issue appeared after these customizations.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> - Chansup
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Michal.Bachorik at sun.com [mailto:Michal.Bachorik at sun.com]
>>>>>>>>> Sent: Monday, August 24, 2009 3:28 AM
>>>>>>>>> To: users at gridengine.sunsource.net
>>>>>>>>> Subject: Re: [GE users] SDM GE adapter sve got in trouble
>>>>>>>>>
>>>>>>>>> Chansup,
>>>>>>>>>
>>>>>>>>> did not your SGE changed in any way? There is error coming from
>>>>>>>>>
>>> jgdi
>>>
>>>>>>>>> (from SGE side). Did not you do some kind of "upgrade/downgrade"
>>>>>>>>>
>>> of
>>>
>>>>>>>>> jgdi.jar? I have not seen such error before, so I will need to
>>>>>>>>>
>> dig
>>
>>>> in
>>>>
>>>>>>> it
>>>>>>>
>>>>>>>
>>>>>>>>> - I will let you know once I found something.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Michal
>>>>>>>>>
>>>>>>>>> cbyun wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Somehow my ge adapter service got in trouble and I couldn't
>>>>>>>>>>
>> start
>>
>>>> it
>>>>
>>>>>>> any
>>>>>>>
>>>>>>>
>>>>>>>>> more:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> # sdmadm suc -c gesvc2
>>>>>>>>>> comp   host            message
>>>>>>>>>> ----------------------------------------
>>>>>>>>>> gesvc2 llgriddev.local startup triggered
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 08/21/2009
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>> 16:25:11|20|e.impl.ge.GEServiceAdapterImpl.doStartService|I|Service
>>>
>>>>>>>>> gesvc2: Starting Grid Engine service
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 08/21/2009
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>> 16:25:11|20|rm.service.impl.AbstractServiceAdapter$1.call|E|Service
>>>
>>>>>>>>> startup failed: jgdi error: java.lang.IllegalStateException:
>>>>>>>>>
>>> content
>>>
>>>>>>> field
>>>>>>>
>>>>>>>
>>>>>>>>> STU_name not found in descriptor
>>>>>>>>>
>>>>>>>>> |set_object_attribute: set_list of property reportVariables
>>>>>>>>>
>> failed
>>
>>>>>>>>>
>>>>>>> |
>>>>>>>
>>>>>>>
>>>>>>>>>> 08/21/2009
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>> 16:25:11|21|rm.impl.AbstractComponent$3.performTransition|W|Componentgesvc
>>
>>>>>>>>> 2: Error in startup procedure: Service gesvc2: Unexpected error
>>>>>>>>>
>> in
>>
>>>>>>> state
>>>>>>>
>>>>>>>
>>>>>>>>> transition UnknownStateHandler[UNKNOWN] ->
>>>>>>>>>
>>>>>>>>>
>>>>>>> StartingStateHandler[STARTING]:
>>>>>>>
>>>>>>>
>>>>>>>>> Service startup failed: jgdi error:
>>>>>>>>>
>>> java.lang.IllegalStateException:
>>>
>>>>>>>>> content field STU_name not found in descriptor
>>>>>>>>>
>>>>>>>>> |set_object_attribute: set_list of property reportVariables
>>>>>>>>>
>> failed
>>
>>>>>>>>>
>>>>>>> |
>>>>>>>
>>>>>>>
>>>>>>>>>> Is there any way to clear up this error?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> - Chansup
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
>>
>>>>>>>>> =213536
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
>>
>>>>>>>>> =213890
>>>>>>>>>
>>>>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> ------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
>>
>>>>>>> =213947
>>>>>>>
>>>>>>>
>>>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>>>>
>>>>>>>>
>>>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>>>
>>>>>>> ------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
>>
>>>>>>> =213956
>>>>>>>
>>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>>>
>>>>>>>
>>>>>> ------------------------------------------------------
>>>>>>
>>>>>>
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
>>
>>>>>> =213969
>>>>>>
>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>>
>>>>>>
>>>>> ------------------------------------------------------
>>>>>
>>>>>
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
>>
>>>> =213992
>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>
>>>> unsubscribe at gridengine.sunsource.net].
>>>>
>>>> ------------------------------------------------------
>>>>
>>>>
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
>>
>>>> =213997
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-
>>>> unsubscribe at gridengine.sunsource.net].
>>>>
>>> ------------------------------------------------------
>>>
>>>
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
>>
>>> =214028
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
>> =214184
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214199
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214201

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list