[GE users] hedeby communication problem: sdmadm on master cannot find itself

reppep pepper at cbio.mskcc.org
Mon Mar 8 18:25:05 GMT 2010


I am having trouble setting up a test Hedeby installation. sdmadm cannot communicate with the Java processes on the local system.

	I installed SGE with JMX & cluster name 'awssge', then followed <http://wiki.gridengine.info/wiki/index.php/SGE-Hedeby-And-Amazon-EC2#HowTo:_Setup_the_Grid_Engine_6.2_Master> with 'hedeby1' as the SDM_SYSTEM name. "sdmadm suj" did *start* the Java processes, but was unable to report on their health.

	I tried again, following <http://wikis.sun.com/display/gridengine62u3/SDM+Installation+Overview> with 'awssge' as the SDM_MASTER name to match the cluster name, but have the same problems.


	My SDM installation command was:

> ~hedeby/bin/sdmadm -s awssge -p system install_master_host -ca_admin_mail '****' -ca_org "Memorial Sloan-Kettering Cancer Center" -ca_org_unit "Computational Biology" -ca_country US -au hedeby -sge_root /common/sge/ -ca_location "New York City" -cs_port 6446 -ca_state "New York"

	Its output (aside from the license text) was:

> Do you agree with the terms of the license ? (Y/N)y
> The License has been accepted by the user.
> Install master host command is using default local spool dir: /var/spool/sdm/awssge
> A configuration for system "awssge" has been added.


	The processes started by 'sdmadm suj' are:

> [root at awssge ~]# ps -ef|grep java
> root	  3863  3855 30 11:47 pts/1	00:00:01 /usr/java/default/bin/java -Djava.library.path=/common/sdm/lib/lx-amd64 -Djava.endorsed.dirs=/common/sdm/lib/ext/endorsed -Dcom.sun.grid.grm.management.connectionTimeout=20 -Djava.security.manager=java.rmi.RMISecurityManager -Djava.security.policy=/common/sdm/util/sdmadm.policy -jar /common/sdm/lib/sdm-starter.jar com.sun.grid.grm.cli.SdmAdm suj
> hedeby	3927  3926 30 11:47 ?		00:00:01 /usr/java/jre1.6.0_18/bin/java -Djava.security.manager=java.rmi.RMISecurityManager -Djava.security.policy==/var/spool/sdm/awssge/security/java.policy -Djava.security.auth.login.config=/var/spool/sdm/awssge/security/jaas.config -Dcom.sun.grid.grm.bootstrap.systemname=awssge -Dcom.sun.grid.grm.bootstrap.jvmname=cs_vm -Dcom.sun.grid.grm.bootstrap.localspool=/var/spool/sdm/awssge -Dcom.sun.grid.grm.bootstrap.dist=/common/sdm -Dcom.sun.grid.grm.bootstrap.csInfo=awssge.cbio.mskcc.org:6446 -Dcom.sun.grid.grm.bootstrap.preferencesType=SYSTEM -Djava.util.logging.manager=com.sun.grid.grm.util.GrmLogManager -Djava.library.path=/common/sdm/lib/lx-amd64::/common/sdm/lib/lx-amd64: -Dcom.sun.grid.grm.bootstrap.isCS=true -cp /common/sdm/lib/sdm-security.jar:/common/sdm/lib/sdm-starter.jar:/common/sdm/lib/sdm-cloud-adapter.jar:/common/sdm/lib/sdm-upgrade.jar:/common/sdm/lib/sdm-common.jar:/common/sdm/lib/sdm-ge-adapter.jar:/common/sdm/lib/ext/
jaxb-impl.jar:/common/sdm/lib/ext/activation.jar:/common/sdm/lib/ext/jsr173_1.0_api.jar -Djava.rmi.server.codebase=file:/common/sdm/lib/sdm-security.jar file:/common/sdm/lib/sdm-starter.jar file:/common/sdm/lib/sdm-cloud-adapter.jar file:/common/sdm/lib/sdm-upgrade.jar file:/common/sdm/lib/sdm-common.jar file:/common/sdm/lib/sdm-ge-adapter.jar file:/common/sdm/lib/ext/jaxb-impl.jar file:/common/sdm/lib/ext/activation.jar file:/common/sdm/lib/ext/jsr173_1.0_api.jar  -Djava.endorsed.dirs=/common/sdm/lib/ext/endorsed -Djava.rmi.server.hostname=awssge.cbio.mskcc.org -Xmx128M -Dcom.sun.grid.grm.management.connectionTimeout=60 com.sun.grid.grm.bootstrap.JVMImpl
> root	  3993  2439  0 11:47 pts/0	00:00:00 grep java

	But sdmadm cannot see them:

> [root at awssge ~]# ~hedeby/bin/sdmadm -s awssge sj
> Error: Cannot connect to JVM cs_vm at awssge_cbio_mskcc_org: Failed to retrieve RMIServer stub: javax.naming.NameNotFoundException: awssge
> [root at awssge ~]# ~hedeby/bin/sdmadm -s awssge sc
> Error: Cannot connect to JVM cs_vm at awssge_cbio_mskcc_org: Failed to retrieve RMIServer stub: javax.naming.NameNotFoundException: awssge
> [root at awssge ~]# ~hedeby/bin/sdmadm -s awssge sbc
> system type   host				  port properties
> ---------------------------------------------------
> awssge SYSTEM awssge.cbio.mskcc.org 6446		   [root at awssge ~]# 

	What am I doing wrong?

Thanks,

Chris Pepper

-- 
Chris Pepper:                <http://cbio.mskcc.org/>
                             <http://www.extrapepperoni.com/>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247543

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list