[GE users] SDM GE adapter sve got in trouble

easymf michal.bachorik at sun.com
Wed Aug 26 19:47:19 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

zwierzak wrote:
> Hi,
>
> SDM package doesnt provide you jgdi.jar, jgdi is taken from SGE_ROOT
> that you specify for ge_adapter service in configuration
>
>
A little correction - by "taken" Rys means, that SDM uses gdi.jar and
juti.jar for each ge-adapter from the SGE_ROOT that you specified in the
ge_adapter configuration. Nothing is being copied/moved, jar files
remain in the SGE_ROOT/lib folder.

M.

> Rys
>
>
> cbyun pisze:
>
>> Hi Andre,
>>
>> I did not restart the SGE qmaster process after modifying the
>> java.policy file.
>>
>> After restarting the SGE qmaster daemon, the following error doesn?t
>> happen anymore:
>>
>> 26/08/2009 12:15:07|25|jgdi.jni.EventClientImpl.close|W|Close of event
>> client failed
>>
>> java.security.AccessControlException: access denied
>> (java.lang.RuntimePermission modifyThread)
>>
>> java.security.AccessControlContext.checkPermission(AccessControlContext.java:323)
>>
>> java.security.AccessController.checkPermission(AccessController.java:546)
>>
>> java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
>>
>> java.util.concurrent.ThreadPoolExecutor.shutdown(ThreadPoolExecutor.java:1094)
>>
>> java.util.concurrent.Executors$DelegatedExecutorService.shutdown(Executors.java:591)
>>
>> com.sun.grid.jgdi.jni.EventClientImpl.close(EventClientImpl.java:157)
>>
>> com.sun.grid.jgdi.management.NotificationBridge.close(NotificationBridge.java:182)
>>
>> com.sun.grid.jgdi.management.JGDISession.close(JGDISession.java:139)
>>
>> com.sun.grid.jgdi.management.JGDISession.closeSession(JGDISession.java:219)
>>
>> com.sun.grid.jgdi.management.JGDIAgent$MyNotificationListener.handleNotification(JGDIAgent.java:403)
>>
>> As far as the jgdi.jar is concerned, I don?t know why they are
>> different even in the distribution package.
>>
>> I don?t even see the jgdi.jar from SDM distribution.
>>
>> Would this be an issue with SGE 6.2u3 packaging?
>>
>> $ tar -ztvf sdm-1_0u3-core.tar.gz |grep jgdi (no jgdi.jar included)
>>
>> $ tar -ztvf sge-6_2u3-bin-linux24-x64.tar.gz |grep jgdi
>>
>> -rwxr-xr-x root/root 3686513 2009-06-04 09:58:00 lib/lx24-amd64/libjgdi.so
>>
>> $ tar -ztvf sge-6_2u3-common.tar.gz |grep jgdi
>>
>> -rw-r--r-- root/root 986077 2009-06-08 05:36:53 lib/jgdi.jar
>>
>> $ tar -ztvf sge-6_2u3-inspect.tar.gz |grep jgdi
>>
>> -rw-r--r-- root/root 987594 2009-06-04 09:40:28
>> sgeinspect/sgeinspect/modules/ext/jgdi.jar
>>
>> Thanks,
>>
>> - Chansup
>>
>> ------------------------------------------------------------------------
>>
>> *From:* Andre.Alefeld at sun.com [mailto:Andre.Alefeld at sun.com]
>> *Sent:* Wednesday, August 26, 2009 1:01 PM
>> *To:* users at gridengine.sunsource.net
>> *Subject:* Re: [GE users] SDM GE adapter sve got in trouble
>>
>> Hi Chansup,
>>
>> did you restart the master after the java.policy change ?
>> The same jgdi.jar and juti.jar files should be used for the JMX thread
>> and sgeinspect and the SDM adapter (or am I wrong ?)
>> From the red lines below they seem to be different which shouldn't be
>> the case.
>>
>> Andre
>>
>> cbyun wrote:
>>
>> Hi Andre,
>>
>> I?m using SGE 6.2u3 distribution.
>>
>> # sdmadm sm
>>
>> module version vendor
>>
>> -------------------------------------------
>>
>> cloud-adapter 1.0 Sun Microsystems
>>
>> common 1.0u3 Sun Microsystems
>>
>> gridengine-adapter 1.0u3 Sun Microsystems
>>
>> security 1.0u3 Sun Microsystems
>>
>> I did add the permission but I?m still getting the same error:
>>
>> # diff java.policy java.policy.orig
>>
>> 13d12
>>
>> < permission java.lang.RuntimePermission "modifyThread";
>>
>> # sdmadm sdj -all -h localhost
>>
>> jvm host result message
>>
>> -------------------------------------
>>
>> cs_vm llgriddev.local STOPPED
>>
>> 26/08/2009 12:15:07|25|jgdi.jni.EventClientImpl.close|W|Close of event
>> client failed
>>
>> java.security.AccessControlException: access denied
>> (java.lang.RuntimePermission modifyThread)
>>
>> java.security.AccessControlContext.checkPermission(AccessControlContext.java:323)
>>
>> java.security.AccessController.checkPermission(AccessController.java:546)
>>
>> java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
>>
>> java.util.concurrent.ThreadPoolExecutor.shutdown(ThreadPoolExecutor.java:1094)
>>
>> java.util.concurrent.Executors$DelegatedExecutorService.shutdown(Executors.java:591)
>>
>> com.sun.grid.jgdi.jni.EventClientImpl.close(EventClientImpl.java:157)
>>
>> com.sun.grid.jgdi.management.NotificationBridge.close(NotificationBridge.java:182)
>>
>> com.sun.grid.jgdi.management.JGDISession.close(JGDISession.java:139)
>>
>> com.sun.grid.jgdi.management.JGDISession.closeSession(JGDISession.java:219)
>>
>> com.sun.grid.jgdi.management.JGDIAgent$MyNotificationListener.handleNotification(JGDIAgent.java:403)
>>
>> I compared the jar files on both directories and they are different in
>> size and date.
>>
>> # pwd
>>
>> /usr/local/sge/62u3/lib
>>
>> # ls -l *jar
>>
>> -rw-r--r-- 1 root root 53280 Jun 8 05:36 drmaa.jar
>>
>> -rw-r--r-- 1 root root 986077 Jun 8 05:36 jgdi.jar
>>
>> -rw-r--r-- 1 root root 157498 Jun 8 05:36 juti.jar
>>
>> # pwd
>>
>> /usr/local/sge/62u3/sgeinspect/sgeinspect/modules/ext
>>
>> # ls -l *jar
>>
>> -rw-r--r-- 1 root root 309294 Jun 4 09:40 jcommon-1.0.15.jar
>>
>> -rw-r--r-- 1 root root 1368681 Jun 4 09:40 jfreechart-1.0.12.jar
>>
>> -rw-r--r-- 1 root root 987594 Jun 4 09:40 jgdi.jar
>>
>> -rw-r--r-- 1 root root 157557 Jun 4 09:40 juti.jar
>>
>> -rw-r--r-- 1 root root 128224 Jun 4 09:40 sdm-cloud-adapter.jar
>>
>> -rw-r--r-- 1 root root 1535432 Jun 4 09:40 sdm-common.jar
>>
>> -rw-r--r-- 1 root root 140989 Jun 4 09:40 sdm-ge-adapter-impl.jar
>>
>> -rw-r--r-- 1 root root 66867 Jun 4 09:40 sdm-ge-adapter.jar
>>
>> -rw-r--r-- 1 root root 26982 Jun 4 09:40 sdm-security-impl.jar
>>
>> -rw-r--r-- 1 root root 158563 Jun 4 09:40 sdm-security.jar
>>
>> -rw-r--r-- 1 root root 18986 Jun 4 09:40 sdm-starter.jar
>>
>> Also, I copied both jgdi.jar and juti.jar from $SGE_ROOT/lib to
>> $SGE_ROOT/sgeinspect/sgeinspect/modules/ext. It didn?t help at all.
>>
>> I?m also wondering how the inspect modules affect the SDM operation.
>>
>> Should I copy those jar files to <SDM_INSTALL>/lib directory?
>>
>> Thanks,
>>
>> - Chansup
>>
>> ------------------------------------------------------------------------
>>
>> *From:* Andre.Alefeld at sun.com <mailto:Andre.Alefeld at sun.com>
>> [mailto:Andre.Alefeld at sun.com]
>> *Sent:* Wednesday, August 26, 2009 9:13 AM
>> *To:* users at gridengine.sunsource.net
>> *Subject:* Re: [GE users] SDM GE adapter sve got in trouble
>>
>> Hi Chansup,
>>
>> can you try to add the following permission to the first grant block
>> in $SGE_ROOT/default/common/jmx/java.policy (and
>> $SGE_ROOT/util/java.policy.template)
>>
>> grant codeBase "file:${com.sun.grid.jgdi.sgeRoot}/lib/jgdi.jar"
>> <file:///%5C%5C%5C%5C$%7bcom.sun.grid.jgdi.sgeRoot%7d%5Clib%5Cjgdi.jar> {
>> ....
>> permission java.lang.RuntimePermission "modifyThread";
>> ....
>> }
>>
>> Then the exception should no longer appear.
>>
>> I guess it is not the cause for the STU_name problem.
>> Maybe there is some mismatch between the jgdi.jar versions. Did you
>> rebuild the gridengine code yourself ? Or did you use everything from
>> the distribution ?
>> Could you try to copy the *.jar files from $SGE_ROOT/lib/ to
>> $SGE_ROOT/sgeinspect/sgeinspect/modules/ext and check if the problem
>> still occurs ?
>> I will try to have the arco variable setup you sent in your last email
>> and check if I can reproduce it for my setup.
>>
>> Andre
>>
>>
>> cbyun wrote:
>>
>> I got some more information.
>>
>> When I started the GE adapter service, JGDI log show the following access denied error.
>>
>> # sdmadm suc -c gesvc
>> comp  host            message
>> ---------------------------------------
>> gesvc llgriddev.local startup triggered
>>
>> # tail -f default/spool/qmaster/jgdi0.log
>> ...
>> 24/08/2009 14:46:54|11|jgdi.jni.EventClientImpl.close|W|Close of event client failed
>>                               java.security.AccessControlException: access denied (java.lang.RuntimePermission modifyThread)
>>                                 java.security.AccessControlContext.checkPermission(AccessControlContext.java:323)
>>                                 java.security.AccessController.checkPermission(AccessController.java:546)
>>                                 java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
>>                                 java.util.concurrent.ThreadPoolExecutor.shutdown(ThreadPoolExecutor.java:1094)
>>                                 java.util.concurrent.Executors$DelegatedExecutorService.shutdown(Executors.java:591)
>>                                 com.sun.grid.jgdi.jni.EventClientImpl.close(EventClientImpl.java:157)
>>                                 com.sun.grid.jgdi.management.NotificationBridge.close(NotificationBridge.java:182)
>>                                 com.sun.grid.jgdi.management.JGDISession.close(JGDISession.java:139)
>>                                 com.sun.grid.jgdi.management.JGDISession.closeSession(JGDISession.java:219)
>>                                 com.sun.grid.jgdi.management.JGDIAgent$MyNotificationListener.handleNotification(JGDIAgent.java:403)
>>
>> Thanks,
>> - Chansup
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: Ryszard.Macidlowski at sun.com <mailto:Ryszard.Macidlowski at sun.com> [mailto:Ryszard.Macidlowski at sun.com]
>>> Sent: Monday, August 24, 2009 1:09 PM
>>> To: users at gridengine.sunsource.net <mailto:users at gridengine.sunsource.net>
>>> Subject: Re: [GE users] SDM GE adapter sve got in trouble
>>>
>>> Hi Chansup,
>>>
>>>  From the error that I see, I dont think that it's SDM adapter problem.
>>> What adapter does during startup it connects using jgdi to the qmaster
>>> (jmx thread) and retrieves data (using this jgdi connection). What I can
>>> see in your stacktrace is that you get IllegalStateException from jgdi
>>> (as long as jgdi throws this exception service will not start and
>>> nothing can be done on SDM side. You didnt make any modifications in
>>> ge_adapter_svc_config.xml I suppose. So either there is a problem with
>>> qmaster (try to restart qmaster and see if you are able to start gesvc
>>>  >> I suppose you already tried this and this is not a problem) or there
>>> is problem/possible bug in jgdi. You possibly customized the SGE that
>>> way that jgdi cannot report/process customized values.
>>>
>>> So to clear the error I supposed you could "undo" your customizations. I
>>> would suggest to do it step by step (after each step try to start SDM
>>> gesvc service to see if it starts and track the problematic change).
>>>
>>> The second approach would be to use install new SGE cell with default
>>> setup and add customizations step by step (after each step try to start
>>> SDM gesvc service to see if it starts and track the problematic change).
>>>
>>> Of course the third approach would be to debug the jgdi :)
>>>
>>> BTW. Have you checked jgdi logs if there is any information
>>>
>>> Rys
>>>
>>> cbyun pisze:
>>>
>>>
>>>> Hi Michal,
>>>>
>>>> I tried the following steps but it still failed to clear the issue.
>>>> Any suggestion to clear the issue?
>>>>
>>>> - Remove GE adapter service
>>>>         sdmadm rs -s gesvc -force
>>>>
>>>> - Shutdonw/Startup SDM master service
>>>>         sdmadm sdj -all -h localhost
>>>>         sdmadm suj
>>>>
>>>> - Add GE adapter service
>>>>         sdmadm ags -h localhost -j cs_vm -s gesvc -f \
>>>>                   <path>/ge_adapter_svc_config.xml
>>>>
>>>> - Startup GE component
>>>>         sdmadm suc -c gesvc -h localhost
>>>>
>>>> I'm still getting the same error:
>>>>
>>>> # 08/24/2009
>>>>
>>>>
>>> 12:20:08|16|e.impl.ge.GEServiceAdapterImpl.doStartService|I|Service gesvc:
>>> Starting Grid Engine service
>>>
>>>
>>>> 08/24/2009
>>>>
>>>>
>>> 12:20:08|16|rm.service.impl.AbstractServiceAdapter$1.call|E|Service
>>> startup failed: jgdi error: java.lang.IllegalStateException: content field
>>> STU_name not found in descriptor
>>>
>>> |set_object_attribute: set_list of property reportVariables failed
>>>
>>>
>>>>                                                                       |
>>>> 08/24/2009
>>>>
>>>>
>>> 12:20:08|17|rm.impl.AbstractComponent$3.performTransition|W|Componentgesvc
>>> : Error in startup procedure: Service gesvc: Unexpected error in state
>>> transition UnknownStateHandler[UNKNOWN] -> StartingStateHandler[STARTING]:
>>> Service startup failed: jgdi error: java.lang.IllegalStateException:
>>> content field STU_name not found in descriptor
>>>
>>> |set_object_attribute: set_list of property reportVariables failed
>>>
>>>
>>>>                                                                       |
>>>>
>>>>
>>>> Thanks,
>>>> - Chansup
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: cbyun [mailto:cbyun at ll.mit.edu]
>>>>> Sent: Monday, August 24, 2009 9:51 AM
>>>>> To: users at gridengine.sunsource.net <mailto:users at gridengine.sunsource.net>
>>>>> Subject: RE: [GE users] SDM GE adapter sve got in trouble
>>>>>
>>>>> Michal,
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Michal.Bachorik at sun.com <mailto:Michal.Bachorik at sun.com> [mailto:Michal.Bachorik at sun.com]
>>>>>> Sent: Monday, August 24, 2009 9:23 AM
>>>>>> To: users at gridengine.sunsource.net <mailto:users at gridengine.sunsource.net>
>>>>>> Subject: Re: [GE users] SDM GE adapter sve got in trouble
>>>>>>
>>>>>> Chansup,
>>>>>>
>>>>>> actually is something called "STU_name" part of your changes? The
>>>>>>
>>>>>>
>>> error
>>>
>>>
>>>>>> says something about "STU_name" not being part of descriptor, so I'd
>>>>>> like to know whether it is something you introduced or touched or
>>>>>>
>>>>>>
>>> not ..
>>>
>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> No, I have no such a thing called as "STU_name" in my configuration.
>>>>> Here is my configuration:
>>>>>
>>>>> # qconf -sconf
>>>>> #global:
>>>>> execd_spool_dir              /var/spool/sge
>>>>> mailer                       /bin/mail
>>>>> xterm                        /usr/bin/X11/xterm
>>>>> load_sensor                  none
>>>>> prolog                       none
>>>>> epilog                       none
>>>>> shell_start_mode             posix_compliant
>>>>> login_shells                 sh,ksh,csh,tcsh
>>>>> min_uid                      0
>>>>> min_gid                      0
>>>>> user_lists                   none
>>>>> xuser_lists                  none
>>>>> projects                     none
>>>>> xprojects                    none
>>>>> enforce_project              false
>>>>> enforce_user                 auto
>>>>> load_report_time             00:00:40
>>>>> max_unheard                  00:05:00
>>>>> reschedule_unknown           00:00:00
>>>>> loglevel                     log_warning
>>>>> administrator_mail           none
>>>>> set_token_cmd                none
>>>>> pag_cmd                      none
>>>>> token_extend_time            none
>>>>> shepherd_cmd                 none
>>>>> qmaster_params               none
>>>>> execd_params                 none
>>>>> reporting_params             accounting=true reporting=true \
>>>>>                              flush_time=00:00:15 joblog=true
>>>>> sharelog=00:00:00
>>>>> finished_jobs                100
>>>>> gid_range                    20000-20100
>>>>> qlogin_command               builtin
>>>>> qlogin_daemon                builtin
>>>>> rlogin_command               builtin
>>>>> rlogin_daemon                builtin
>>>>> rsh_command                  builtin
>>>>> rsh_daemon                   builtin
>>>>> max_aj_instances             2000
>>>>> max_aj_tasks                 75000
>>>>> max_u_jobs                   0
>>>>> max_jobs                     0
>>>>> max_advance_reservations     0
>>>>> auto_user_oticket            0
>>>>> auto_user_fshare             0
>>>>> auto_user_default_project    none
>>>>> auto_user_delete_time        86400
>>>>> delegated_file_staging       false
>>>>> reprioritize                 0
>>>>> jsv_url                      none
>>>>> libjvm_path
>>>>> /usr/java/latest/jre/lib/amd64/server/libjvm.so
>>>>> additional_jvm_args          -Xmx256m
>>>>> jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w
>>>>>
>>>>>
>>>>> # qconf -ssconf
>>>>> algorithm                         default
>>>>> schedule_interval                 0:2:0
>>>>> maxujobs                          0
>>>>> queue_sort_method                 load
>>>>> job_load_adjustments              NONE
>>>>> load_adjustment_decay_time        0:0:0
>>>>> load_formula                      np_load_avg
>>>>> schedd_job_info                   true
>>>>> flush_submit_sec                  2
>>>>> flush_finish_sec                  2
>>>>> params                            none
>>>>> reprioritize_interval             0:0:0
>>>>> halftime                          168
>>>>> usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
>>>>> compensation_factor               5.000000
>>>>> weight_user                       0.250000
>>>>> weight_project                    0.250000
>>>>> weight_department                 0.250000
>>>>> weight_job                        0.250000
>>>>> weight_tickets_functional         0
>>>>> weight_tickets_share              0
>>>>> share_override_tickets            TRUE
>>>>> share_functional_shares           TRUE
>>>>> max_functional_jobs_to_schedule   200
>>>>> report_pjob_tickets               FALSE
>>>>> max_pending_tasks_per_job         50
>>>>> halflife_decay_list               none
>>>>> policy_hierarchy                  OFS
>>>>> weight_ticket                     0.010000
>>>>> weight_waiting_time               0.000000
>>>>> weight_deadline                   3600000.000000
>>>>> weight_urgency                    0.100000
>>>>> weight_priority                   1.000000
>>>>> max_reservation                   0
>>>>> default_duration                  INFINITY
>>>>>
>>>>>
>>>>> # qconf -srqs
>>>>> {
>>>>>    name         host_slot_limit
>>>>>    description  Limit total number of slots per hosts (assume uniform
>>>>> machines)
>>>>>    enabled      TRUE
>>>>>    limit        hosts {@allhosts} to slots=2
>>>>> }
>>>>> {
>>>>>    name         max_u_jobs
>>>>>    description  max jobs per user
>>>>>    enabled      TRUE
>>>>>    limit        users {*} to slots=256
>>>>> }
>>>>>
>>>>>
>>>>> # for i in `qconf -sql`; do echo " ";echo Qname: $i; qconf -sq $i; done
>>>>>
>>>>> Qname: all.q
>>>>> qname                 all.q
>>>>> hostlist              @nohosts
>>>>> seq_no                0
>>>>> load_thresholds       np_load_avg=1.75
>>>>> suspend_thresholds    NONE
>>>>> nsuspend              1
>>>>> suspend_interval      00:05:00
>>>>> priority              0
>>>>> min_cpu_interval      00:05:00
>>>>> processors            UNDEFINED
>>>>> qtype                 BATCH INTERACTIVE
>>>>> ckpt_list             NONE
>>>>> pe_list               make
>>>>> rerun                 FALSE
>>>>> slots                 2
>>>>> tmpdir                /tmp
>>>>> shell                 /bin/csh
>>>>> prolog                NONE
>>>>> epilog                NONE
>>>>> shell_start_mode      posix_compliant
>>>>> starter_method        NONE
>>>>> suspend_method        NONE
>>>>> resume_method         NONE
>>>>> terminate_method      NONE
>>>>> notify                00:00:60
>>>>> owner_list            NONE
>>>>> user_lists            NONE
>>>>> xuser_lists           NONE
>>>>> subordinate_list      NONE
>>>>> complex_values        NONE
>>>>> projects              NONE
>>>>> xprojects             NONE
>>>>> calendar              NONE
>>>>> initial_state         default
>>>>> s_rt                  INFINITY
>>>>> h_rt                  INFINITY
>>>>> s_cpu                 INFINITY
>>>>> h_cpu                 INFINITY
>>>>> s_fsize               INFINITY
>>>>> h_fsize               INFINITY
>>>>> s_data                INFINITY
>>>>> h_data                INFINITY
>>>>> s_stack               INFINITY
>>>>> h_stack               INFINITY
>>>>> s_core                INFINITY
>>>>> h_core                INFINITY
>>>>> s_rss                 INFINITY
>>>>> h_rss                 INFINITY
>>>>> s_vmem                INFINITY
>>>>> h_vmem                INFINITY
>>>>>
>>>>> Qname: normal
>>>>> qname                 normal
>>>>> hostlist              @allhosts
>>>>> seq_no                0
>>>>> load_thresholds       np_load_avg=1.75
>>>>> suspend_thresholds    NONE
>>>>> nsuspend              1
>>>>> suspend_interval      00:05:00
>>>>> priority              0
>>>>> min_cpu_interval      00:05:00
>>>>> processors            UNDEFINED
>>>>> qtype                 BATCH INTERACTIVE
>>>>> ckpt_list             NONE
>>>>> pe_list               make
>>>>> rerun                 FALSE
>>>>> slots                 2
>>>>> tmpdir                /tmp
>>>>> shell                 /bin/csh
>>>>> prolog                NONE
>>>>> epilog                NONE
>>>>> shell_start_mode      posix_compliant
>>>>> starter_method        NONE
>>>>> suspend_method        NONE
>>>>> resume_method         NONE
>>>>> terminate_method      NONE
>>>>> notify                00:00:60
>>>>> owner_list            NONE
>>>>> user_lists            NONE
>>>>> xuser_lists           NONE
>>>>> subordinate_list      NONE
>>>>> complex_values        NONE
>>>>> projects              NONE
>>>>> xprojects             NONE
>>>>> calendar              NONE
>>>>> initial_state         default
>>>>> s_rt                  INFINITY
>>>>> h_rt                  INFINITY
>>>>> s_cpu                 INFINITY
>>>>> h_cpu                 INFINITY
>>>>> s_fsize               INFINITY
>>>>> h_fsize               INFINITY
>>>>> s_data                INFINITY
>>>>> h_data                INFINITY
>>>>> s_stack               INFINITY
>>>>> h_stack               INFINITY
>>>>> s_core                INFINITY
>>>>> h_core                INFINITY
>>>>> s_rss                 INFINITY
>>>>> h_rss                 INFINITY
>>>>> s_vmem                INFINITY
>>>>> h_vmem                INFINITY
>>>>>
>>>>> Qname: pmatlab
>>>>> qname                 pmatlab
>>>>> hostlist              @allhosts
>>>>> seq_no                0
>>>>> load_thresholds       np_load_avg=1.75
>>>>> suspend_thresholds    NONE
>>>>> nsuspend              1
>>>>> suspend_interval      00:05:00
>>>>> priority              0
>>>>> min_cpu_interval      00:05:00
>>>>> processors            UNDEFINED
>>>>> qtype                 BATCH INTERACTIVE
>>>>> ckpt_list             NONE
>>>>> pe_list               make
>>>>> rerun                 FALSE
>>>>> slots                 2
>>>>> tmpdir                /tmp
>>>>> shell                 /bin/csh
>>>>> prolog                NONE
>>>>> epilog                NONE
>>>>> shell_start_mode      posix_compliant
>>>>> starter_method        NONE
>>>>> suspend_method        NONE
>>>>> resume_method         NONE
>>>>> terminate_method      NONE
>>>>> notify                00:00:60
>>>>> owner_list            NONE
>>>>> user_lists            NONE
>>>>> xuser_lists           NONE
>>>>> subordinate_list      NONE
>>>>> complex_values        NONE
>>>>> projects              NONE
>>>>> xprojects             NONE
>>>>> calendar              NONE
>>>>> initial_state         default
>>>>> s_rt                  INFINITY
>>>>> h_rt                  INFINITY
>>>>> s_cpu                 INFINITY
>>>>> h_cpu                 INFINITY
>>>>> s_fsize               INFINITY
>>>>> h_fsize               INFINITY
>>>>> s_data                INFINITY
>>>>> h_data                INFINITY
>>>>> s_stack               INFINITY
>>>>> h_stack               INFINITY
>>>>> s_core                INFINITY
>>>>> h_core                INFINITY
>>>>> s_rss                 INFINITY
>>>>> h_rss                 INFINITY
>>>>> s_vmem                INFINITY
>>>>> h_vmem                INFINITY
>>>>>
>>>>>
>>>>> Currently no hosts are assigned to neither of host groups since none of
>>>>> hosts are not being used by SGE adapter service:
>>>>>
>>>>> # qconf -shgrp @allhosts
>>>>> group_name @allhosts
>>>>> hostlist NONE
>>>>>
>>>>> # qconf -shgrp @nohosts
>>>>> group_name @nohosts
>>>>> hostlist NONE
>>>>>
>>>>> I also turned on the exclusive mode:
>>>>>
>>>>> # qconf -sc
>>>>> #name               shortcut   type        relop   requestable
>>>>>
>>>>>
>>> consumable
>>>
>>>
>>>>> default  urgency
>>>>> #----------------------------------------------------------------------
>>>>>
>>>>>
>>> ---
>>>
>>>
>>>>> -----------------
>>>>> ...
>>>>> exclusive           excl       BOOL        EXCL    YES         YES
>>>>> 0        1000
>>>>>
>>>>>
>>>>> Thanks,
>>>>> - Chansup
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> M.
>>>>>>
>>>>>>
>>>>>> cbyun wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Michal,
>>>>>>>
>>>>>>> Yes, I made a few changes in the SGE configuration.
>>>>>>>
>>>>>>> I added a couple of RQS rules, a couple of new cluster queues and a
>>>>>>>
>>>>>>>
>>>>>>>
>>>>> new
>>>>>
>>>>>
>>>>>
>>>>>> hostgroup, @nohosts.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Then, in order to make all.q from being used, I assigned @nohosts
>>>>>>>
>>>>>>>
>>>>>>>
>>>>> group
>>>>>
>>>>>
>>>>>
>>>>>> to the all.q.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I believe the issue appeared after these customizations.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> - Chansup
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Michal.Bachorik at sun.com <mailto:Michal.Bachorik at sun.com> [mailto:Michal.Bachorik at sun.com]
>>>>>>>> Sent: Monday, August 24, 2009 3:28 AM
>>>>>>>> To: users at gridengine.sunsource.net <mailto:users at gridengine.sunsource.net>
>>>>>>>> Subject: Re: [GE users] SDM GE adapter sve got in trouble
>>>>>>>>
>>>>>>>> Chansup,
>>>>>>>>
>>>>>>>> did not your SGE changed in any way? There is error coming from jgdi
>>>>>>>> (from SGE side). Did not you do some kind of "upgrade/downgrade" of
>>>>>>>> jgdi.jar? I have not seen such error before, so I will need to dig
>>>>>>>>
>>>>>>>>
>>> in
>>>
>>>
>>>>>> it
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> - I will let you know once I found something.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Michal
>>>>>>>>
>>>>>>>> cbyun wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Somehow my ge adapter service got in trouble and I couldn't start
>>>>>>>>>
>>>>>>>>>
>>> it
>>>
>>>
>>>>>> any
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> more:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> # sdmadm suc -c gesvc2
>>>>>>>>> comp   host            message
>>>>>>>>> ----------------------------------------
>>>>>>>>> gesvc2 llgriddev.local startup triggered
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 08/21/2009
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> 16:25:11|20|e.impl.ge.GEServiceAdapterImpl.doStartService|I|Service
>>>>>>>> gesvc2: Starting Grid Engine service
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> 08/21/2009
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> 16:25:11|20|rm.service.impl.AbstractServiceAdapter$1.call|E|Service
>>>>>>>> startup failed: jgdi error: java.lang.IllegalStateException: content
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> field
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> STU_name not found in descriptor
>>>>>>>>
>>>>>>>> |set_object_attribute: set_list of property reportVariables failed
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> |
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>> 08/21/2009
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>> 16:25:11|21|rm.impl.AbstractComponent$3.performTransition|W|Componentgesvc
>>>
>>>
>>>>>>>> 2: Error in startup procedure: Service gesvc2: Unexpected error in
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> state
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> transition UnknownStateHandler[UNKNOWN] ->
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> StartingStateHandler[STARTING]:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> Service startup failed: jgdi error: java.lang.IllegalStateException:
>>>>>>>> content field STU_name not found in descriptor
>>>>>>>>
>>>>>>>> |set_object_attribute: set_list of property reportVariables failed
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> |
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>> Is there any way to clear up this error?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> - Chansup
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId>
>>>
>>>
>>>>>>>> =213536
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> unsubscribe at gridengine.sunsource.net <mailto:unsubscribe at gridengine.sunsource.net>].
>>>>>>>>
>>>>>>>> ------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId>
>>>
>>>
>>>>>>>> =213890
>>>>>>>>
>>>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>>>> unsubscribe at gridengine.sunsource.net <mailto:unsubscribe at gridengine.sunsource.net>].
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> ------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId>
>>>
>>>
>>>>>> =213947
>>>>>>
>>>>>>
>>>>>>
>>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> unsubscribe at gridengine.sunsource.net <mailto:unsubscribe at gridengine.sunsource.net>].
>>>>>>
>>>>>> ------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId>
>>>
>>>
>>>>>> =213956
>>>>>>
>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>> unsubscribe at gridengine.sunsource.net <mailto:unsubscribe at gridengine.sunsource.net>].
>>>>>>
>>>>>>
>>>>>>
>>>>> ------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId>
>>>
>>>
>>>>> =213969
>>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>> unsubscribe at gridengine.sunsource.net <mailto:unsubscribe at gridengine.sunsource.net>].
>>>>>
>>>>>
>>>>>
>>>> ------------------------------------------------------
>>>>
>>>>
>>>>
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId>
>>> =213992
>>>
>>>
>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>
>>>>
>>> unsubscribe at gridengine.sunsource.net <mailto:unsubscribe at gridengine.sunsource.net>].
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId>
>>> =213997
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net <mailto:unsubscribe at gridengine.sunsource.net>].
>>>
>>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214028 <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214028>
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net <mailto:users-unsubscribe at gridengine.sunsource.net>].
>>
>>
>>
>>
>>
>> --
>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> Andre Alefeld                Phone: ++49 (0)941 3075-255
>> Software Engineering         Fax:   ++49 (0)941 3075-222
>> Sun Microsystems GmbH
>> Dr.-Leo-Ritter-Str. 7      mailto: andre.alefeld at sun.com <mailto:andre.alefeld at sun.com>
>> D-93049 Regensburg           http://www.sun.com/gridware
>>
>>
>>
>> --
>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> Andre Alefeld                Phone: ++49 (0)941 3075-255
>> Software Engineering         Fax:   ++49 (0)941 3075-222
>> Sun Microsystems GmbH
>> Dr.-Leo-Ritter-Str. 7      mailto: andre.alefeld at sun.com <mailto:andre.alefeld at sun.com>
>> D-93049 Regensburg           http://www.sun.com/gridware
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214419
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214423

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list