No subject


Wed Jan 12 20:38:46 GMT 2011


Andre

cbyun wrote:
Hi Andre,

I?m using SGE 6.2u3 distribution.

# sdmadm sm
module             version vendor
-------------------------------------------
cloud-adapter      1.0     Sun Microsystems
common             1.0u3   Sun Microsystems
gridengine-adapter 1.0u3   Sun Microsystems
security           1.0u3   Sun Microsystems


I did add the permission but I?m still getting the same error:

# diff java.policy java.policy.orig
13d12
<    permission java.lang.RuntimePermission "modifyThread";

# sdmadm sdj -all -h localhost
jvm   host            result  message
-------------------------------------
cs_vm llgriddev.local STOPPED

26/08/2009 12:15:07|25|jgdi.jni.EventClientImpl.close|W|Close of event client failed
                              java.security.AccessControlException: access denied (java.lang.RuntimePermission modifyThread)
                                java.security.AccessControlContext.checkPermission(AccessControlContext.java:323)
                                java.security.AccessController.checkPermission(AccessController.java:546)
                                java.lang.SecurityManager.checkPermission(SecurityManager.java:532)
                                java.util.concurrent.ThreadPoolExecutor.shutdown(ThreadPoolExecutor.java:1094)
                                java.util.concurrent.Executors$DelegatedExecutorService.shutdown(Executors.java:591)
                                com.sun.grid.jgdi.jni.EventClientImpl.close(EventClientImpl.java:157)
                                com.sun.grid.jgdi.management.NotificationBridge.close(NotificationBridge.java:182)
                                com.sun.grid.jgdi.management.JGDISession.close(JGDISession.java:139)
                                com.sun.grid.jgdi.management.JGDISession.closeSession(JGDISession.java:219)
                                com.sun.grid.jgdi.management.JGDIAgent$MyNotificationListener.handleNotification(JGDIAgent.java:403)


I compared the jar files on both directories and they are different in size and date.

# pwd
/usr/local/sge/62u3/lib
# ls -l *jar
-rw-r--r-- 1 root root  53280 Jun  8 05:36 drmaa.jar
-rw-r--r-- 1 root root 986077 Jun  8 05:36 jgdi.jar
-rw-r--r-- 1 root root 157498 Jun  8 05:36 juti.jar

# pwd
/usr/local/sge/62u3/sgeinspect/sgeinspect/modules/ext
# ls -l *jar
-rw-r--r-- 1 root root  309294 Jun  4 09:40 jcommon-1.0.15.jar
-rw-r--r-- 1 root root 1368681 Jun  4 09:40 jfreechart-1.0.12.jar
-rw-r--r-- 1 root root  987594 Jun  4 09:40 jgdi.jar
-rw-r--r-- 1 root root  157557 Jun  4 09:40 juti.jar
-rw-r--r-- 1 root root  128224 Jun  4 09:40 sdm-cloud-adapter.jar
-rw-r--r-- 1 root root 1535432 Jun  4 09:40 sdm-common.jar
-rw-r--r-- 1 root root  140989 Jun  4 09:40 sdm-ge-adapter-impl.jar
-rw-r--r-- 1 root root   66867 Jun  4 09:40 sdm-ge-adapter.jar
-rw-r--r-- 1 root root   26982 Jun  4 09:40 sdm-security-impl.jar
-rw-r--r-- 1 root root  158563 Jun  4 09:40 sdm-security.jar
-rw-r--r-- 1 root root   18986 Jun  4 09:40 sdm-starter.jar


Also, I copied both jgdi.jar and juti.jar from $SGE_ROOT/lib to $SGE_ROOT/sgeinspect/sgeinspect/modules/ext. It didn?t help at all.

I?m also wondering how the inspect modules affect the SDM operation.
Should I copy those jar files to <SDM_INSTALL>/lib directory?

Thanks,
- Chansup



________________________________
From: Andre.Alefeld at sun.com<mailto:Andre.Alefeld at sun.com> [mailto:Andre.Alefeld at sun.com]
Sent: Wednesday, August 26, 2009 9:13 AM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] SDM GE adapter sve got in trouble

Hi Chansup,

can you try to add the following permission to the first grant block in $SGE_ROOT/default/common/jmx/java.policy (and $SGE_ROOT/util/java.policy.template)

grant codeBase "file:${com.sun.grid.jgdi.sgeRoot}/lib/jgdi.jar"<file:///%5C%5C$%7bcom.sun.grid.jgdi.sgeRoot%7d%5Clib%5Cjgdi.jar>  {
....
   permission java.lang.RuntimePermission "modifyThread";
....
}

Then the exception should no longer appear.

I guess it is not the cause for the STU_name problem.
Maybe there is some mismatch between the jgdi.jar versions. Did you rebuild the gridengine code yourself ? Or did you use everything from the distribution ?
Could you try to copy the *.jar files from $SGE_ROOT/lib/ to $SGE_ROOT/sgeinspect/sgeinspect/modules/ext and check if the problem still occurs ?
I will try to have the arco variable setup you sent in your last email and check if I can reproduce it for my setup.

Andre


cbyun wrote:

I got some more information.



When I started the GE adapter service, JGDI log show the following access denied error.



# sdmadm suc -c gesvc

comp  host            message

---------------------------------------

gesvc llgriddev.local startup triggered



# tail -f default/spool/qmaster/jgdi0.log

...

24/08/2009 14:46:54|11|jgdi.jni.EventClientImpl.close|W|Close of event client failed

                              java.security.AccessControlException: access denied (java.lang.RuntimePermission modifyThread)

                                java.security.AccessControlContext.checkPermission(AccessControlContext.java:323)

                                java.security.AccessController.checkPermission(AccessController.java:546)

                                java.lang.SecurityManager.checkPermission(SecurityManager.java:532)

                                java.util.concurrent.ThreadPoolExecutor.shutdown(ThreadPoolExecutor.java:1094)

                                java.util.concurrent.Executors$DelegatedExecutorService.shutdown(Executors.java:591)

                                com.sun.grid.jgdi.jni.EventClientImpl.close(EventClientImpl.java:157)

                                com.sun.grid.jgdi.management.NotificationBridge.close(NotificationBridge.java:182)

                                com.sun.grid.jgdi.management.JGDISession.close(JGDISession.java:139)

                                com.sun.grid.jgdi.management.JGDISession.closeSession(JGDISession.java:219)

                                com.sun.grid.jgdi.management.JGDIAgent$MyNotificationListener.handleNotification(JGDIAgent.java:403)



Thanks,

- Chansup







-----Original Message-----

From: Ryszard.Macidlowski at sun.com<mailto:Ryszard.Macidlowski at sun.com> [mailto:Ryszard.Macidlowski at sun.com]

Sent: Monday, August 24, 2009 1:09 PM

To: users at gridengine.sunsource.net<mailto:users at gridengine.sunsource.net>

Subject: Re: [GE users] SDM GE adapter sve got in trouble



Hi Chansup,



 From the error that I see, I dont think that it's SDM adapter problem.

What adapter does during startup it connects using jgdi to the qmaster

(jmx thread) and retrieves data (using this jgdi connection). What I can

see in your stacktrace is that you get IllegalStateException from jgdi

(as long as jgdi throws this exception service will not start and

nothing can be done on SDM side. You didnt make any modifications in

ge_adapter_svc_config.xml I suppose. So either there is a problem with

qmaster (try to restart qmaster and see if you are able to start gesvc

 >> I suppose you already tried this and this is not a problem) or there

is problem/possible bug in jgdi. You possibly customized the SGE that

way that jgdi cannot report/process customized values.



So to clear the error I supposed you could "undo" your customizations. I

would suggest to do it step by step (after each step try to start SDM

gesvc service to see if it starts and track the problematic change).



The second approach would be to use install new SGE cell with default

setup and add customizations step by step (after each step try to start

SDM gesvc service to see if it starts and track the problematic change).



Of course the third approach would be to debug the jgdi :)



BTW. Have you checked jgdi logs if there is any information



Rys



cbyun pisze:



Hi Michal,



I tried the following steps but it still failed to clear the issue.

Any suggestion to clear the issue?



- Remove GE adapter service

        sdmadm rs -s gesvc -force



- Shutdonw/Startup SDM master service

        sdmadm sdj -all -h localhost

        sdmadm suj



- Add GE adapter service

        sdmadm ags -h localhost -j cs_vm -s gesvc -f \

                  <path>/ge_adapter_svc_config.xml



- Startup GE component

        sdmadm suc -c gesvc -h localhost



I'm still getting the same error:



# 08/24/2009



12:20:08|16|e.impl.ge.GEServiceAdapterImpl.doStartService|I|Service gesvc:

Starting Grid Engine service



08/24/2009



12:20:08|16|rm.service.impl.AbstractServiceAdapter$1.call|E|Service

startup failed: jgdi error: java.lang.IllegalStateException: content field

STU_name not found in descriptor



|set_object_attribute: set_list of property reportVariables failed



                                                                      |

08/24/2009



12:20:08|17|rm.impl.AbstractComponent$3.performTransition|W|Componentgesvc

: Error in startup procedure: Service gesvc: Unexpected error in state

transition UnknownStateHandler[UNKNOWN] -> StartingStateHandler[STARTING]:

Service startup failed: jgdi error: java.lang.IllegalStateException:

content field STU_name not found in descriptor



|set_object_attribute: set_list of property reportVariables failed



                                                                      |





Thanks,

- Chansup









-----Original Message-----

From: cbyun [mailto:cbyun at ll.mit.edu]

Sent: Monday, August 24, 2009 9:51 AM

To: users at gridengine.sunsource.net<mailto:users at gridengine.sunsource.net>

Subject: RE: [GE users] SDM GE adapter sve got in trouble



Michal,







-----Original Message-----

From: Michal.Bachorik at sun.com<mailto:Michal.Bachorik at sun.com> [mailto:Michal.Bachorik at sun.com]

Sent: Monday, August 24, 2009 9:23 AM

To: users at gridengine.sunsource.net<mailto:users at gridengine.sunsource.net>

Subject: Re: [GE users] SDM GE adapter sve got in trouble



Chansup,



actually is something called "STU_name" part of your changes? The



error



says something about "STU_name" not being part of descriptor, so I'd

like to know whether it is something you introduced or touched or



not ..







No, I have no such a thing called as "STU_name" in my configuration.

Here is my configuration:



# qconf -sconf

#global:

execd_spool_dir              /var/spool/sge

mailer                       /bin/mail

xterm                        /usr/bin/X11/xterm

load_sensor                  none

prolog                       none

epilog                       none

shell_start_mode             posix_compliant

login_shells                 sh,ksh,csh,tcsh

min_uid                      0

min_gid                      0

user_lists                   none

xuser_lists                  none

projects                     none

xprojects                    none

enforce_project              false

enforce_user                 auto

load_report_time             00:00:40

max_unheard                  00:05:00

reschedule_unknown           00:00:00

loglevel                     log_warning

administrator_mail           none

set_token_cmd                none

pag_cmd                      none

token_extend_time            none

shepherd_cmd                 none

qmaster_params               none

execd_params                 none

reporting_params             accounting=true reporting=true \

                             flush_time=00:00:15 joblog=true

sharelog=00:00:00

finished_jobs                100

gid_range                    20000-20100

qlogin_command               builtin

qlogin_daemon                builtin

rlogin_command               builtin

rlogin_daemon                builtin

rsh_command                  builtin

rsh_daemon                   builtin

max_aj_instances             2000

max_aj_tasks                 75000

max_u_jobs                   0

max_jobs                     0

max_advance_reservations     0

auto_user_oticket            0

auto_user_fshare             0

auto_user_default_project    none

auto_user_delete_time        86400

delegated_file_staging       false

reprioritize                 0

jsv_url                      none

libjvm_path

/usr/java/latest/jre/lib/amd64/server/libjvm.so

additional_jvm_args          -Xmx256m

jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w





# qconf -ssconf

algorithm                         default

schedule_interval                 0:2:0

maxujobs                          0

queue_sort_method                 load

job_load_adjustments              NONE

load_adjustment_decay_time        0:0:0

load_formula                      np_load_avg

schedd_job_info                   true

flush_submit_sec                  2

flush_finish_sec                  2

params                            none

reprioritize_interval             0:0:0

halftime                          168

usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000

compensation_factor               5.000000

weight_user                       0.250000

weight_project                    0.250000

weight_department                 0.250000

weight_job                        0.250000

weight_tickets_functional         0

weight_tickets_share              0

share_override_tickets            TRUE

share_functional_shares           TRUE

max_functional_jobs_to_schedule   200

report_pjob_tickets               FALSE

max_pending_tasks_per_job         50

halflife_decay_list               none

policy_hierarchy                  OFS

weight_ticket                     0.010000

weight_waiting_time               0.000000

weight_deadline                   3600000.000000

weight_urgency                    0.100000

weight_priority                   1.000000

max_reservation                   0

default_duration                  INFINITY





# qconf -srqs

{

   name         host_slot_limit

   description  Limit total number of slots per hosts (assume uniform

machines)

   enabled      TRUE

   limit        hosts {@allhosts} to slots=2

}

{

   name         max_u_jobs

   description  max jobs per user

   enabled      TRUE

   limit        users {*} to slots=256

}





# for i in `qconf -sql`; do echo " ";echo Qname: $i; qconf -sq $i; done



Qname: all.q

qname                 all.q

hostlist              @nohosts

seq_no                0

load_thresholds       np_load_avg=1.75

suspend_thresholds    NONE

nsuspend              1

suspend_interval      00:05:00

priority              0

min_cpu_interval      00:05:00

processors            UNDEFINED

qtype                 BATCH INTERACTIVE

ckpt_list             NONE

pe_list               make

rerun                 FALSE

slots                 2

tmpdir                /tmp

shell                 /bin/csh

prolog                NONE

epilog                NONE

shell_start_mode      posix_compliant

starter_method        NONE

suspend_method        NONE

resume_method         NONE

terminate_method      NONE

notify                00:00:60

owner_list            NONE

user_lists            NONE

xuser_lists           NONE

subordinate_list      NONE

complex_values        NONE

projects              NONE

xprojects             NONE

calendar              NONE

initial_state         default

s_rt                  INFINITY

h_rt                  INFINITY

s_cpu                 INFINITY

h_cpu                 INFINITY

s_fsize               INFINITY

h_fsize               INFINITY

s_data                INFINITY

h_data                INFINITY

s_stack               INFINITY

h_stack               INFINITY

s_core                INFINITY

h_core                INFINITY

s_rss                 INFINITY

h_rss                 INFINITY

s_vmem                INFINITY

h_vmem                INFINITY



Qname: normal

qname                 normal

hostlist              @allhosts

seq_no                0

load_thresholds       np_load_avg=1.75

suspend_thresholds    NONE

nsuspend              1

suspend_interval      00:05:00

priority              0

min_cpu_interval      00:05:00

processors            UNDEFINED

qtype                 BATCH INTERACTIVE

ckpt_list             NONE

pe_list               make

rerun                 FALSE

slots                 2

tmpdir                /tmp

shell                 /bin/csh

prolog                NONE

epilog                NONE

shell_start_mode      posix_compliant

starter_method        NONE

suspend_method        NONE

resume_method         NONE

terminate_method      NONE

notify                00:00:60

owner_list            NONE

user_lists            NONE

xuser_lists           NONE

subordinate_list      NONE

complex_values        NONE

projects              NONE

xprojects             NONE

calendar              NONE

initial_state         default

s_rt                  INFINITY

h_rt                  INFINITY

s_cpu                 INFINITY

h_cpu                 INFINITY

s_fsize               INFINITY

h_fsize               INFINITY

s_data                INFINITY

h_data                INFINITY

s_stack               INFINITY

h_stack               INFINITY

s_core                INFINITY

h_core                INFINITY

s_rss                 INFINITY

h_rss                 INFINITY

s_vmem                INFINITY

h_vmem                INFINITY



Qname: pmatlab

qname                 pmatlab

hostlist              @allhosts

seq_no                0

load_thresholds       np_load_avg=1.75

suspend_thresholds    NONE

nsuspend              1

suspend_interval      00:05:00

priority              0

min_cpu_interval      00:05:00

processors            UNDEFINED

qtype                 BATCH INTERACTIVE

ckpt_list             NONE

pe_list               make

rerun                 FALSE

slots                 2

tmpdir                /tmp

shell                 /bin/csh

prolog                NONE

epilog                NONE

shell_start_mode      posix_compliant

starter_method        NONE

suspend_method        NONE

resume_method         NONE

terminate_method      NONE

notify                00:00:60

owner_list            NONE

user_lists            NONE

xuser_lists           NONE

subordinate_list      NONE

complex_values        NONE

projects              NONE

xprojects             NONE

calendar              NONE

initial_state         default

s_rt                  INFINITY

h_rt                  INFINITY

s_cpu                 INFINITY

h_cpu                 INFINITY

s_fsize               INFINITY

h_fsize               INFINITY

s_data                INFINITY

h_data                INFINITY

s_stack               INFINITY

h_stack               INFINITY

s_core                INFINITY

h_core                INFINITY

s_rss                 INFINITY

h_rss                 INFINITY

s_vmem                INFINITY

h_vmem                INFINITY





Currently no hosts are assigned to neither of host groups since none of

hosts are not being used by SGE adapter service:



# qconf -shgrp @allhosts

group_name @allhosts

hostlist NONE



# qconf -shgrp @nohosts

group_name @nohosts

hostlist NONE



I also turned on the exclusive mode:



# qconf -sc

#name               shortcut   type        relop   requestable



consumable



default  urgency

#----------------------------------------------------------------------



---



-----------------

...

exclusive           excl       BOOL        EXCL    YES         YES

0        1000





Thanks,

- Chansup













M.





cbyun wrote:





Michal,



Yes, I made a few changes in the SGE configuration.



I added a couple of RQS rules, a couple of new cluster queues and a





new





hostgroup, @nohosts.





Then, in order to make all.q from being used, I assigned @nohosts





group





to the all.q.





I believe the issue appeared after these customizations.



Thanks,

- Chansup











-----Original Message-----

From: Michal.Bachorik at sun.com<mailto:Michal.Bachorik at sun.com> [mailto:Michal.Bachorik at sun.com]

Sent: Monday, August 24, 2009 3:28 AM

To: users at gridengine.sunsource.net<mailto:users at gridengine.sunsource.net>

Subject: Re: [GE users] SDM GE adapter sve got in trouble



Chansup,



did not your SGE changed in any way? There is error coming from jgdi

(from SGE side). Did not you do some kind of "upgrade/downgrade" of

jgdi.jar? I have not seen such error before, so I will need to dig



in



it





- I will let you know once I found something.



Regards,



Michal



cbyun wrote:







Hi,



Somehow my ge adapter service got in trouble and I couldn't start



it



any





more:







# sdmadm suc -c gesvc2

comp   host            message

----------------------------------------

gesvc2 llgriddev.local startup triggered





08/21/2009







16:25:11|20|e.impl.ge.GEServiceAdapterImpl.doStartService|I|Service

gesvc2: Starting Grid Engine service







08/21/2009







16:25:11|20|rm.service.impl.AbstractServiceAdapter$1.call|E|Service

startup failed: jgdi error: java.lang.IllegalStateException: content





field





STU_name not found in descriptor



|set_object_attribute: set_list of property reportVariables failed







|





08/21/2009







16:25:11|21|rm.impl.AbstractComponent$3.performTransition|W|Componentgesvc



2: Error in startup procedure: Service gesvc2: Unexpected error in





state





transition UnknownStateHandler[UNKNOWN] ->





StartingStateHandler[STARTING]:





Service startup failed: jgdi error: java.lang.IllegalStateException:

content field STU_name not found in descriptor



|set_object_attribute: set_list of property reportVariables failed







|





Is there any way to clear up this error?



Thanks,

- Chansup



------------------------------------------------------









http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId



=213536







To unsubscribe from this discussion, e-mail: [users-







unsubscribe at gridengine.sunsource.net<mailto:unsubscribe at gridengine.sunsource.net>].



------------------------------------------------------







http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId



=213890



To unsubscribe from this discussion, e-mail: [users-

unsubscribe at gridengine.sunsource.net<mailto:unsubscribe at gridengine.sunsource.net>].







------------------------------------------------------







http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId



=213947





To unsubscribe from this discussion, e-mail: [users-





unsubscribe at gridengine.sunsource.net<mailto:unsubscribe at gridengine.sunsource.net>].



------------------------------------------------------







http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId



=213956



To unsubscribe from this discussion, e-mail: [users-

unsubscribe at gridengine.sunsource.net<mailto:unsubscribe at gridengine.sunsource.net>].





------------------------------------------------------





http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId



=213969



To unsubscribe from this discussion, e-mail: [users-

unsubscribe at gridengine.sunsource.net<mailto:unsubscribe at gridengine.sunsource.net>].





------------------------------------------------------





http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId

=213992



To unsubscribe from this discussion, e-mail: [users-



unsubscribe at gridengine.sunsource.net<mailto:unsubscribe at gridengine.sunsource.net>].



------------------------------------------------------

http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId

=213997



To unsubscribe from this discussion, e-mail: [users-

unsubscribe at gridengine.sunsource.net<mailto:unsubscribe at gridengine.sunsource.net>].





------------------------------------------------------

http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214028



To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].





--

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Andre Alefeld                Phone: ++49 (0)941 3075-255

Software Engineering         Fax:   ++49 (0)941 3075-222

Sun Microsystems GmbH

Dr.-Leo-Ritter-Str. 7      mailto: andre.alefeld at sun.com<mailto:andre.alefeld at sun.com>

D-93049 Regensburg           http://www.sun.com/gridware


--
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Andre Alefeld                Phone: ++49 (0)941 3075-255
Software Engineering         Fax:   ++49 (0)941 3075-222
Sun Microsystems GmbH
Dr.-Leo-Ritter-Str. 7        mailto: andre.alefeld at sun.com<mailto:andre.alefeld at sun.com>
D-93049 Regensburg           http://www.sun.com/gridware




More information about the gridengine-users mailing list