[GE users] SDM GE adapter sve got in trouble

zwierzak Ryszard.Macidlowski at sun.com
Wed Aug 26 19:39:45 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,

SDM package doesnt provide you jgdi.jar, jgdi is taken from SGE_ROOT
that you specify for ge_adapter service in configuration

Rys


cbyun pisze:
>
> Hi Andre,
>
> I did not restart the SGE qmaster process after modifying the
> java.policy file.
>
> After restarting the SGE qmaster daemon, the following error doesn?t
> happen anymore:
>
> 26/08/2009 12:15:07|25|jgdi.jni.EventClientImpl.close|W|Close of event
> client failed
>
> java.security.AccessControlException: access denied
> (java.lang.RuntimePermission modifyThread)
>
> java.security.AccessControlContext.checkPermission(AccessControlContext.java:323)
>
> java.security.AccessController.checkPermission(AccessController.java:546)
>
> java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
>
> java.util.concurrent.ThreadPoolExecutor.shutdown(ThreadPoolExecutor.java:1094)
>
> java.util.concurrent.Executors$DelegatedExecutorService.shutdown(Executors.java:591)
>
> com.sun.grid.jgdi.jni.EventClientImpl.close(EventClientImpl.java:157)
>
> com.sun.grid.jgdi.management.NotificationBridge.close(NotificationBridge.java:182)
>
> com.sun.grid.jgdi.management.JGDISession.close(JGDISession.java:139)
>
> com.sun.grid.jgdi.management.JGDISession.closeSession(JGDISession.java:219)
>
> com.sun.grid.jgdi.management.JGDIAgent$MyNotificationListener.handleNotification(JGDIAgent.java:403)
>
> As far as the jgdi.jar is concerned, I don?t know why they are
> different even in the distribution package.
>
> I don?t even see the jgdi.jar from SDM distribution.
>
> Would this be an issue with SGE 6.2u3 packaging?
>
> $ tar -ztvf sdm-1_0u3-core.tar.gz |grep jgdi (no jgdi.jar included)
>
> $ tar -ztvf sge-6_2u3-bin-linux24-x64.tar.gz |grep jgdi
>
> -rwxr-xr-x root/root 3686513 2009-06-04 09:58:00 lib/lx24-amd64/libjgdi.so
>
> $ tar -ztvf sge-6_2u3-common.tar.gz |grep jgdi
>
> -rw-r--r-- root/root 986077 2009-06-08 05:36:53 lib/jgdi.jar
>
> $ tar -ztvf sge-6_2u3-inspect.tar.gz |grep jgdi
>
> -rw-r--r-- root/root 987594 2009-06-04 09:40:28
> sgeinspect/sgeinspect/modules/ext/jgdi.jar
>
> Thanks,
>
> - Chansup
>
> ------------------------------------------------------------------------
>
> *From:* Andre.Alefeld at sun.com [mailto:Andre.Alefeld at sun.com]
> *Sent:* Wednesday, August 26, 2009 1:01 PM
> *To:* users at gridengine.sunsource.net
> *Subject:* Re: [GE users] SDM GE adapter sve got in trouble
>
> Hi Chansup,
>
> did you restart the master after the java.policy change ?
> The same jgdi.jar and juti.jar files should be used for the JMX thread
> and sgeinspect and the SDM adapter (or am I wrong ?)
> From the red lines below they seem to be different which shouldn't be
> the case.
>
> Andre
>
> cbyun wrote:
>
> Hi Andre,
>
> I?m using SGE 6.2u3 distribution.
>
> # sdmadm sm
>
> module version vendor
>
> -------------------------------------------
>
> cloud-adapter 1.0 Sun Microsystems
>
> common 1.0u3 Sun Microsystems
>
> gridengine-adapter 1.0u3 Sun Microsystems
>
> security 1.0u3 Sun Microsystems
>
> I did add the permission but I?m still getting the same error:
>
> # diff java.policy java.policy.orig
>
> 13d12
>
> < permission java.lang.RuntimePermission "modifyThread";
>
> # sdmadm sdj -all -h localhost
>
> jvm host result message
>
> -------------------------------------
>
> cs_vm llgriddev.local STOPPED
>
> 26/08/2009 12:15:07|25|jgdi.jni.EventClientImpl.close|W|Close of event
> client failed
>
> java.security.AccessControlException: access denied
> (java.lang.RuntimePermission modifyThread)
>
> java.security.AccessControlContext.checkPermission(AccessControlContext.java:323)
>
> java.security.AccessController.checkPermission(AccessController.java:546)
>
> java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
>
> java.util.concurrent.ThreadPoolExecutor.shutdown(ThreadPoolExecutor.java:1094)
>
> java.util.concurrent.Executors$DelegatedExecutorService.shutdown(Executors.java:591)
>
> com.sun.grid.jgdi.jni.EventClientImpl.close(EventClientImpl.java:157)
>
> com.sun.grid.jgdi.management.NotificationBridge.close(NotificationBridge.java:182)
>
> com.sun.grid.jgdi.management.JGDISession.close(JGDISession.java:139)
>
> com.sun.grid.jgdi.management.JGDISession.closeSession(JGDISession.java:219)
>
> com.sun.grid.jgdi.management.JGDIAgent$MyNotificationListener.handleNotification(JGDIAgent.java:403)
>
> I compared the jar files on both directories and they are different in
> size and date.
>
> # pwd
>
> /usr/local/sge/62u3/lib
>
> # ls -l *jar
>
> -rw-r--r-- 1 root root 53280 Jun 8 05:36 drmaa.jar
>
> -rw-r--r-- 1 root root 986077 Jun 8 05:36 jgdi.jar
>
> -rw-r--r-- 1 root root 157498 Jun 8 05:36 juti.jar
>
> # pwd
>
> /usr/local/sge/62u3/sgeinspect/sgeinspect/modules/ext
>
> # ls -l *jar
>
> -rw-r--r-- 1 root root 309294 Jun 4 09:40 jcommon-1.0.15.jar
>
> -rw-r--r-- 1 root root 1368681 Jun 4 09:40 jfreechart-1.0.12.jar
>
> -rw-r--r-- 1 root root 987594 Jun 4 09:40 jgdi.jar
>
> -rw-r--r-- 1 root root 157557 Jun 4 09:40 juti.jar
>
> -rw-r--r-- 1 root root 128224 Jun 4 09:40 sdm-cloud-adapter.jar
>
> -rw-r--r-- 1 root root 1535432 Jun 4 09:40 sdm-common.jar
>
> -rw-r--r-- 1 root root 140989 Jun 4 09:40 sdm-ge-adapter-impl.jar
>
> -rw-r--r-- 1 root root 66867 Jun 4 09:40 sdm-ge-adapter.jar
>
> -rw-r--r-- 1 root root 26982 Jun 4 09:40 sdm-security-impl.jar
>
> -rw-r--r-- 1 root root 158563 Jun 4 09:40 sdm-security.jar
>
> -rw-r--r-- 1 root root 18986 Jun 4 09:40 sdm-starter.jar
>
> Also, I copied both jgdi.jar and juti.jar from $SGE_ROOT/lib to
> $SGE_ROOT/sgeinspect/sgeinspect/modules/ext. It didn?t help at all.
>
> I?m also wondering how the inspect modules affect the SDM operation.
>
> Should I copy those jar files to <SDM_INSTALL>/lib directory?
>
> Thanks,
>
> - Chansup
>
> ------------------------------------------------------------------------
>
> *From:* Andre.Alefeld at sun.com <mailto:Andre.Alefeld at sun.com>
> [mailto:Andre.Alefeld at sun.com]
> *Sent:* Wednesday, August 26, 2009 9:13 AM
> *To:* users at gridengine.sunsource.net
> *Subject:* Re: [GE users] SDM GE adapter sve got in trouble
>
> Hi Chansup,
>
> can you try to add the following permission to the first grant block
> in $SGE_ROOT/default/common/jmx/java.policy (and
> $SGE_ROOT/util/java.policy.template)
>
> grant codeBase "file:${com.sun.grid.jgdi.sgeRoot}/lib/jgdi.jar"
> <file:///%5C%5C%5C%5C$%7bcom.sun.grid.jgdi.sgeRoot%7d%5Clib%5Cjgdi.jar> {
> ....
> permission java.lang.RuntimePermission "modifyThread";
> ....
> }
>
> Then the exception should no longer appear.
>
> I guess it is not the cause for the STU_name problem.
> Maybe there is some mismatch between the jgdi.jar versions. Did you
> rebuild the gridengine code yourself ? Or did you use everything from
> the distribution ?
> Could you try to copy the *.jar files from $SGE_ROOT/lib/ to
> $SGE_ROOT/sgeinspect/sgeinspect/modules/ext and check if the problem
> still occurs ?
> I will try to have the arco variable setup you sent in your last email
> and check if I can reproduce it for my setup.
>
> Andre
>
>
> cbyun wrote:
>
> I got some more information.
>
> When I started the GE adapter service, JGDI log show the following access denied error.
>
> # sdmadm suc -c gesvc
> comp  host            message
> ---------------------------------------
> gesvc llgriddev.local startup triggered
>
> # tail -f default/spool/qmaster/jgdi0.log
> ...
> 24/08/2009 14:46:54|11|jgdi.jni.EventClientImpl.close|W|Close of event client failed
>                               java.security.AccessControlException: access denied (java.lang.RuntimePermission modifyThread)
>                                 java.security.AccessControlContext.checkPermission(AccessControlContext.java:323)
>                                 java.security.AccessController.checkPermission(AccessController.java:546)
>                                 java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
>                                 java.util.concurrent.ThreadPoolExecutor.shutdown(ThreadPoolExecutor.java:1094)
>                                 java.util.concurrent.Executors$DelegatedExecutorService.shutdown(Executors.java:591)
>                                 com.sun.grid.jgdi.jni.EventClientImpl.close(EventClientImpl.java:157)
>                                 com.sun.grid.jgdi.management.NotificationBridge.close(NotificationBridge.java:182)
>                                 com.sun.grid.jgdi.management.JGDISession.close(JGDISession.java:139)
>                                 com.sun.grid.jgdi.management.JGDISession.closeSession(JGDISession.java:219)
>                                 com.sun.grid.jgdi.management.JGDIAgent$MyNotificationListener.handleNotification(JGDIAgent.java:403)
>
> Thanks,
> - Chansup
>
>
>
>> -----Original Message-----
>> From: Ryszard.Macidlowski at sun.com <mailto:Ryszard.Macidlowski at sun.com> [mailto:Ryszard.Macidlowski at sun.com]
>> Sent: Monday, August 24, 2009 1:09 PM
>> To: users at gridengine.sunsource.net <mailto:users at gridengine.sunsource.net>
>> Subject: Re: [GE users] SDM GE adapter sve got in trouble
>>
>> Hi Chansup,
>>
>>  From the error that I see, I dont think that it's SDM adapter problem.
>> What adapter does during startup it connects using jgdi to the qmaster
>> (jmx thread) and retrieves data (using this jgdi connection). What I can
>> see in your stacktrace is that you get IllegalStateException from jgdi
>> (as long as jgdi throws this exception service will not start and
>> nothing can be done on SDM side. You didnt make any modifications in
>> ge_adapter_svc_config.xml I suppose. So either there is a problem with
>> qmaster (try to restart qmaster and see if you are able to start gesvc
>>  >> I suppose you already tried this and this is not a problem) or there
>> is problem/possible bug in jgdi. You possibly customized the SGE that
>> way that jgdi cannot report/process customized values.
>>
>> So to clear the error I supposed you could "undo" your customizations. I
>> would suggest to do it step by step (after each step try to start SDM
>> gesvc service to see if it starts and track the problematic change).
>>
>> The second approach would be to use install new SGE cell with default
>> setup and add customizations step by step (after each step try to start
>> SDM gesvc service to see if it starts and track the problematic change).
>>
>> Of course the third approach would be to debug the jgdi :)
>>
>> BTW. Have you checked jgdi logs if there is any information
>>
>> Rys
>>
>> cbyun pisze:
>>
>>> Hi Michal,
>>>
>>> I tried the following steps but it still failed to clear the issue.
>>> Any suggestion to clear the issue?
>>>
>>> - Remove GE adapter service
>>>         sdmadm rs -s gesvc -force
>>>
>>> - Shutdonw/Startup SDM master service
>>>         sdmadm sdj -all -h localhost
>>>         sdmadm suj
>>>
>>> - Add GE adapter service
>>>         sdmadm ags -h localhost -j cs_vm -s gesvc -f \
>>>                   <path>/ge_adapter_svc_config.xml
>>>
>>> - Startup GE component
>>>         sdmadm suc -c gesvc -h localhost
>>>
>>> I'm still getting the same error:
>>>
>>> # 08/24/2009
>>>
>> 12:20:08|16|e.impl.ge.GEServiceAdapterImpl.doStartService|I|Service gesvc:
>> Starting Grid Engine service
>>
>>> 08/24/2009
>>>
>> 12:20:08|16|rm.service.impl.AbstractServiceAdapter$1.call|E|Service
>> startup failed: jgdi error: java.lang.IllegalStateException: content field
>> STU_name not found in descriptor
>>
>> |set_object_attribute: set_list of property reportVariables failed
>>
>>>                                                                       |
>>> 08/24/2009
>>>
>> 12:20:08|17|rm.impl.AbstractComponent$3.performTransition|W|Componentgesvc
>> : Error in startup procedure: Service gesvc: Unexpected error in state
>> transition UnknownStateHandler[UNKNOWN] -> StartingStateHandler[STARTING]:
>> Service startup failed: jgdi error: java.lang.IllegalStateException:
>> content field STU_name not found in descriptor
>>
>> |set_object_attribute: set_list of property reportVariables failed
>>
>>>                                                                       |
>>>
>>>
>>> Thanks,
>>> - Chansup
>>>
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: cbyun [mailto:cbyun at ll.mit.edu]
>>>> Sent: Monday, August 24, 2009 9:51 AM
>>>> To: users at gridengine.sunsource.net <mailto:users at gridengine.sunsource.net>
>>>> Subject: RE: [GE users] SDM GE adapter sve got in trouble
>>>>
>>>> Michal,
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Michal.Bachorik at sun.com <mailto:Michal.Bachorik at sun.com> [mailto:Michal.Bachorik at sun.com]
>>>>> Sent: Monday, August 24, 2009 9:23 AM
>>>>> To: users at gridengine.sunsource.net <mailto:users at gridengine.sunsource.net>
>>>>> Subject: Re: [GE users] SDM GE adapter sve got in trouble
>>>>>
>>>>> Chansup,
>>>>>
>>>>> actually is something called "STU_name" part of your changes? The
>>>>>
>> error
>>
>>>>> says something about "STU_name" not being part of descriptor, so I'd
>>>>> like to know whether it is something you introduced or touched or
>>>>>
>> not ..
>>
>>>>>
>>>>>
>>>> No, I have no such a thing called as "STU_name" in my configuration.
>>>> Here is my configuration:
>>>>
>>>> # qconf -sconf
>>>> #global:
>>>> execd_spool_dir              /var/spool/sge
>>>> mailer                       /bin/mail
>>>> xterm                        /usr/bin/X11/xterm
>>>> load_sensor                  none
>>>> prolog                       none
>>>> epilog                       none
>>>> shell_start_mode             posix_compliant
>>>> login_shells                 sh,ksh,csh,tcsh
>>>> min_uid                      0
>>>> min_gid                      0
>>>> user_lists                   none
>>>> xuser_lists                  none
>>>> projects                     none
>>>> xprojects                    none
>>>> enforce_project              false
>>>> enforce_user                 auto
>>>> load_report_time             00:00:40
>>>> max_unheard                  00:05:00
>>>> reschedule_unknown           00:00:00
>>>> loglevel                     log_warning
>>>> administrator_mail           none
>>>> set_token_cmd                none
>>>> pag_cmd                      none
>>>> token_extend_time            none
>>>> shepherd_cmd                 none
>>>> qmaster_params               none
>>>> execd_params                 none
>>>> reporting_params             accounting=true reporting=true \
>>>>                              flush_time=00:00:15 joblog=true
>>>> sharelog=00:00:00
>>>> finished_jobs                100
>>>> gid_range                    20000-20100
>>>> qlogin_command               builtin
>>>> qlogin_daemon                builtin
>>>> rlogin_command               builtin
>>>> rlogin_daemon                builtin
>>>> rsh_command                  builtin
>>>> rsh_daemon                   builtin
>>>> max_aj_instances             2000
>>>> max_aj_tasks                 75000
>>>> max_u_jobs                   0
>>>> max_jobs                     0
>>>> max_advance_reservations     0
>>>> auto_user_oticket            0
>>>> auto_user_fshare             0
>>>> auto_user_default_project    none
>>>> auto_user_delete_time        86400
>>>> delegated_file_staging       false
>>>> reprioritize                 0
>>>> jsv_url                      none
>>>> libjvm_path
>>>> /usr/java/latest/jre/lib/amd64/server/libjvm.so
>>>> additional_jvm_args          -Xmx256m
>>>> jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w
>>>>
>>>>
>>>> # qconf -ssconf
>>>> algorithm                         default
>>>> schedule_interval                 0:2:0
>>>> maxujobs                          0
>>>> queue_sort_method                 load
>>>> job_load_adjustments              NONE
>>>> load_adjustment_decay_time        0:0:0
>>>> load_formula                      np_load_avg
>>>> schedd_job_info                   true
>>>> flush_submit_sec                  2
>>>> flush_finish_sec                  2
>>>> params                            none
>>>> reprioritize_interval             0:0:0
>>>> halftime                          168
>>>> usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
>>>> compensation_factor               5.000000
>>>> weight_user                       0.250000
>>>> weight_project                    0.250000
>>>> weight_department                 0.250000
>>>> weight_job                        0.250000
>>>> weight_tickets_functional         0
>>>> weight_tickets_share              0
>>>> share_override_tickets            TRUE
>>>> share_functional_shares           TRUE
>>>> max_functional_jobs_to_schedule   200
>>>> report_pjob_tickets               FALSE
>>>> max_pending_tasks_per_job         50
>>>> halflife_decay_list               none
>>>> policy_hierarchy                  OFS
>>>> weight_ticket                     0.010000
>>>> weight_waiting_time               0.000000
>>>> weight_deadline                   3600000.000000
>>>> weight_urgency                    0.100000
>>>> weight_priority                   1.000000
>>>> max_reservation                   0
>>>> default_duration                  INFINITY
>>>>
>>>>
>>>> # qconf -srqs
>>>> {
>>>>    name         host_slot_limit
>>>>    description  Limit total number of slots per hosts (assume uniform
>>>> machines)
>>>>    enabled      TRUE
>>>>    limit        hosts {@allhosts} to slots=2
>>>> }
>>>> {
>>>>    name         max_u_jobs
>>>>    description  max jobs per user
>>>>    enabled      TRUE
>>>>    limit        users {*} to slots=256
>>>> }
>>>>
>>>>
>>>> # for i in `qconf -sql`; do echo " ";echo Qname: $i; qconf -sq $i; done
>>>>
>>>> Qname: all.q
>>>> qname                 all.q
>>>> hostlist              @nohosts
>>>> seq_no                0
>>>> load_thresholds       np_load_avg=1.75
>>>> suspend_thresholds    NONE
>>>> nsuspend              1
>>>> suspend_interval      00:05:00
>>>> priority              0
>>>> min_cpu_interval      00:05:00
>>>> processors            UNDEFINED
>>>> qtype                 BATCH INTERACTIVE
>>>> ckpt_list             NONE
>>>> pe_list               make
>>>> rerun                 FALSE
>>>> slots                 2
>>>> tmpdir                /tmp
>>>> shell                 /bin/csh
>>>> prolog                NONE
>>>> epilog                NONE
>>>> shell_start_mode      posix_compliant
>>>> starter_method        NONE
>>>> suspend_method        NONE
>>>> resume_method         NONE
>>>> terminate_method      NONE
>>>> notify                00:00:60
>>>> owner_list            NONE
>>>> user_lists            NONE
>>>> xuser_lists           NONE
>>>> subordinate_list      NONE
>>>> complex_values        NONE
>>>> projects              NONE
>>>> xprojects             NONE
>>>> calendar              NONE
>>>> initial_state         default
>>>> s_rt                  INFINITY
>>>> h_rt                  INFINITY
>>>> s_cpu                 INFINITY
>>>> h_cpu                 INFINITY
>>>> s_fsize               INFINITY
>>>> h_fsize               INFINITY
>>>> s_data                INFINITY
>>>> h_data                INFINITY
>>>> s_stack               INFINITY
>>>> h_stack               INFINITY
>>>> s_core                INFINITY
>>>> h_core                INFINITY
>>>> s_rss                 INFINITY
>>>> h_rss                 INFINITY
>>>> s_vmem                INFINITY
>>>> h_vmem                INFINITY
>>>>
>>>> Qname: normal
>>>> qname                 normal
>>>> hostlist              @allhosts
>>>> seq_no                0
>>>> load_thresholds       np_load_avg=1.75
>>>> suspend_thresholds    NONE
>>>> nsuspend              1
>>>> suspend_interval      00:05:00
>>>> priority              0
>>>> min_cpu_interval      00:05:00
>>>> processors            UNDEFINED
>>>> qtype                 BATCH INTERACTIVE
>>>> ckpt_list             NONE
>>>> pe_list               make
>>>> rerun                 FALSE
>>>> slots                 2
>>>> tmpdir                /tmp
>>>> shell                 /bin/csh
>>>> prolog                NONE
>>>> epilog                NONE
>>>> shell_start_mode      posix_compliant
>>>> starter_method        NONE
>>>> suspend_method        NONE
>>>> resume_method         NONE
>>>> terminate_method      NONE
>>>> notify                00:00:60
>>>> owner_list            NONE
>>>> user_lists            NONE
>>>> xuser_lists           NONE
>>>> subordinate_list      NONE
>>>> complex_values        NONE
>>>> projects              NONE
>>>> xprojects             NONE
>>>> calendar              NONE
>>>> initial_state         default
>>>> s_rt                  INFINITY
>>>> h_rt                  INFINITY
>>>> s_cpu                 INFINITY
>>>> h_cpu                 INFINITY
>>>> s_fsize               INFINITY
>>>> h_fsize               INFINITY
>>>> s_data                INFINITY
>>>> h_data                INFINITY
>>>> s_stack               INFINITY
>>>> h_stack               INFINITY
>>>> s_core                INFINITY
>>>> h_core                INFINITY
>>>> s_rss                 INFINITY
>>>> h_rss                 INFINITY
>>>> s_vmem                INFINITY
>>>> h_vmem                INFINITY
>>>>
>>>> Qname: pmatlab
>>>> qname                 pmatlab
>>>> hostlist              @allhosts
>>>> seq_no                0
>>>> load_thresholds       np_load_avg=1.75
>>>> suspend_thresholds    NONE
>>>> nsuspend              1
>>>> suspend_interval      00:05:00
>>>> priority              0
>>>> min_cpu_interval      00:05:00
>>>> processors            UNDEFINED
>>>> qtype                 BATCH INTERACTIVE
>>>> ckpt_list             NONE
>>>> pe_list               make
>>>> rerun                 FALSE
>>>> slots                 2
>>>> tmpdir                /tmp
>>>> shell                 /bin/csh
>>>> prolog                NONE
>>>> epilog                NONE
>>>> shell_start_mode      posix_compliant
>>>> starter_method        NONE
>>>> suspend_method        NONE
>>>> resume_method         NONE
>>>> terminate_method      NONE
>>>> notify                00:00:60
>>>> owner_list            NONE
>>>> user_lists            NONE
>>>> xuser_lists           NONE
>>>> subordinate_list      NONE
>>>> complex_values        NONE
>>>> projects              NONE
>>>> xprojects             NONE
>>>> calendar              NONE
>>>> initial_state         default
>>>> s_rt                  INFINITY
>>>> h_rt                  INFINITY
>>>> s_cpu                 INFINITY
>>>> h_cpu                 INFINITY
>>>> s_fsize               INFINITY
>>>> h_fsize               INFINITY
>>>> s_data                INFINITY
>>>> h_data                INFINITY
>>>> s_stack               INFINITY
>>>> h_stack               INFINITY
>>>> s_core                INFINITY
>>>> h_core                INFINITY
>>>> s_rss                 INFINITY
>>>> h_rss                 INFINITY
>>>> s_vmem                INFINITY
>>>> h_vmem                INFINITY
>>>>
>>>>
>>>> Currently no hosts are assigned to neither of host groups since none of
>>>> hosts are not being used by SGE adapter service:
>>>>
>>>> # qconf -shgrp @allhosts
>>>> group_name @allhosts
>>>> hostlist NONE
>>>>
>>>> # qconf -shgrp @nohosts
>>>> group_name @nohosts
>>>> hostlist NONE
>>>>
>>>> I also turned on the exclusive mode:
>>>>
>>>> # qconf -sc
>>>> #name               shortcut   type        relop   requestable
>>>>
>> consumable
>>
>>>> default  urgency
>>>> #----------------------------------------------------------------------
>>>>
>> ---
>>
>>>> -----------------
>>>> ...
>>>> exclusive           excl       BOOL        EXCL    YES         YES
>>>> 0        1000
>>>>
>>>>
>>>> Thanks,
>>>> - Chansup
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> M.
>>>>>
>>>>>
>>>>> cbyun wrote:
>>>>>
>>>>>
>>>>>> Michal,
>>>>>>
>>>>>> Yes, I made a few changes in the SGE configuration.
>>>>>>
>>>>>> I added a couple of RQS rules, a couple of new cluster queues and a
>>>>>>
>>>>>>
>>>> new
>>>>
>>>>
>>>>> hostgroup, @nohosts.
>>>>>
>>>>>
>>>>>> Then, in order to make all.q from being used, I assigned @nohosts
>>>>>>
>>>>>>
>>>> group
>>>>
>>>>
>>>>> to the all.q.
>>>>>
>>>>>
>>>>>> I believe the issue appeared after these customizations.
>>>>>>
>>>>>> Thanks,
>>>>>> - Chansup
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Michal.Bachorik at sun.com <mailto:Michal.Bachorik at sun.com> [mailto:Michal.Bachorik at sun.com]
>>>>>>> Sent: Monday, August 24, 2009 3:28 AM
>>>>>>> To: users at gridengine.sunsource.net <mailto:users at gridengine.sunsource.net>
>>>>>>> Subject: Re: [GE users] SDM GE adapter sve got in trouble
>>>>>>>
>>>>>>> Chansup,
>>>>>>>
>>>>>>> did not your SGE changed in any way? There is error coming from jgdi
>>>>>>> (from SGE side). Did not you do some kind of "upgrade/downgrade" of
>>>>>>> jgdi.jar? I have not seen such error before, so I will need to dig
>>>>>>>
>> in
>>
>>>>> it
>>>>>
>>>>>
>>>>>>> - I will let you know once I found something.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Michal
>>>>>>>
>>>>>>> cbyun wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Somehow my ge adapter service got in trouble and I couldn't start
>>>>>>>>
>> it
>>
>>>>> any
>>>>>
>>>>>
>>>>>>> more:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> # sdmadm suc -c gesvc2
>>>>>>>> comp   host            message
>>>>>>>> ----------------------------------------
>>>>>>>> gesvc2 llgriddev.local startup triggered
>>>>>>>>
>>>>>>>>
>>>>>>>> 08/21/2009
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> 16:25:11|20|e.impl.ge.GEServiceAdapterImpl.doStartService|I|Service
>>>>>>> gesvc2: Starting Grid Engine service
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> 08/21/2009
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> 16:25:11|20|rm.service.impl.AbstractServiceAdapter$1.call|E|Service
>>>>>>> startup failed: jgdi error: java.lang.IllegalStateException: content
>>>>>>>
>>>>>>>
>>>>> field
>>>>>
>>>>>
>>>>>>> STU_name not found in descriptor
>>>>>>>
>>>>>>> |set_object_attribute: set_list of property reportVariables failed
>>>>>>>
>>>>>>>
>>>>>>>
>>>>> |
>>>>>
>>>>>
>>>>>>>> 08/21/2009
>>>>>>>>
>>>>>>>>
>>>>>>>>
>> 16:25:11|21|rm.impl.AbstractComponent$3.performTransition|W|Componentgesvc
>>
>>>>>>> 2: Error in startup procedure: Service gesvc2: Unexpected error in
>>>>>>>
>>>>>>>
>>>>> state
>>>>>
>>>>>
>>>>>>> transition UnknownStateHandler[UNKNOWN] ->
>>>>>>>
>>>>>>>
>>>>> StartingStateHandler[STARTING]:
>>>>>
>>>>>
>>>>>>> Service startup failed: jgdi error: java.lang.IllegalStateException:
>>>>>>> content field STU_name not found in descriptor
>>>>>>>
>>>>>>> |set_object_attribute: set_list of property reportVariables failed
>>>>>>>
>>>>>>>
>>>>>>>
>>>>> |
>>>>>
>>>>>
>>>>>>>> Is there any way to clear up this error?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> - Chansup
>>>>>>>>
>>>>>>>> ------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId>
>>
>>>>>>> =213536
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> unsubscribe at gridengine.sunsource.net <mailto:unsubscribe at gridengine.sunsource.net>].
>>>>>>>
>>>>>>> ------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId>
>>
>>>>>>> =213890
>>>>>>>
>>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>>> unsubscribe at gridengine.sunsource.net <mailto:unsubscribe at gridengine.sunsource.net>].
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId>
>>
>>>>> =213947
>>>>>
>>>>>
>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>>
>>>>>>
>>>>> unsubscribe at gridengine.sunsource.net <mailto:unsubscribe at gridengine.sunsource.net>].
>>>>>
>>>>> ------------------------------------------------------
>>>>>
>>>>>
>>>>>
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId>
>>
>>>>> =213956
>>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>> unsubscribe at gridengine.sunsource.net <mailto:unsubscribe at gridengine.sunsource.net>].
>>>>>
>>>>>
>>>> ------------------------------------------------------
>>>>
>>>>
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId>
>>
>>>> =213969
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-
>>>> unsubscribe at gridengine.sunsource.net <mailto:unsubscribe at gridengine.sunsource.net>].
>>>>
>>>>
>>> ------------------------------------------------------
>>>
>>>
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId>
>> =213992
>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>>
>> unsubscribe at gridengine.sunsource.net <mailto:unsubscribe at gridengine.sunsource.net>].
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId>
>> =213997
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net <mailto:unsubscribe at gridengine.sunsource.net>].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214028 <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214028>
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net <mailto:users-unsubscribe at gridengine.sunsource.net>].
>
>
>
>
>
> --
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> Andre Alefeld                Phone: ++49 (0)941 3075-255
> Software Engineering         Fax:   ++49 (0)941 3075-222
> Sun Microsystems GmbH
> Dr.-Leo-Ritter-Str. 7      mailto: andre.alefeld at sun.com <mailto:andre.alefeld at sun.com>
> D-93049 Regensburg           http://www.sun.com/gridware
>
>
>
> --
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> Andre Alefeld                Phone: ++49 (0)941 3075-255
> Software Engineering         Fax:   ++49 (0)941 3075-222
> Sun Microsystems GmbH
> Dr.-Leo-Ritter-Str. 7      mailto: andre.alefeld at sun.com <mailto:andre.alefeld at sun.com>
> D-93049 Regensburg           http://www.sun.com/gridware

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214419

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list