[GE users] Re: [GE users] hedeby communication problem: sdmadm on master cannot find itself]

rhierlmeier richard.hierlmeier at sun.com
Mon Mar 15 13:55:01 GMT 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi Chris,

What was the output of the sdmadm suj command? How long did it run?

Can you send me the output of the log file of the JVM. It is stored in 
<local_spool_dir>/log/cs_vm-0.log

In your scenarion the <local_spool_dir> is /var/sdm/awssge

Please check also that the hostname awssge can be correctly resolved on the 
hedeby master host.



Richard


> I am having trouble setting up a test Hedeby installation. sdmadm cannot communicate with the Java processes on the local system.
> 
>     I installed SGE with JMX & cluster name 'awssge', then followed <http://wiki.gridengine.info/wiki/index.php/SGE-Hedeby-And-Amazon-EC2#HowTo:_Setup_the_Grid_Engine_6.2_Master> with 'hedeby1' as the SDM_SYSTEM name. "sdmadm suj" did *start* the Java processes, but was unable to report on their health.
> 
>     I tried again, following <http://wikis.sun.com/display/gridengine62u3/SDM+Installation+Overview> with 'awssge' as the SDM_MASTER name to match the cluster name, but have the same problems.
> 
> 
>     My SDM installation command was:
> 
>> ~hedeby/bin/sdmadm -s awssge -p system install_master_host -ca_admin_mail '****' -ca_org "Memorial Sloan-Kettering Cancer Center" -ca_org_unit "Computational Biology" -ca_country US -au hedeby -sge_root /common/sge/ -ca_location "New York City" -cs_port 6446 -ca_state "New York"
> 
>     Its output (aside from the license text) was:
> 
>> Do you agree with the terms of the license ? (Y/N)y
>> The License has been accepted by the user.
>> Install master host command is using default local spool dir: /var/spool/sdm/awssge
>> A configuration for system "awssge" has been added.
> 
> 
>     The processes started by 'sdmadm suj' are:
> 
>> [root at awssge ~]# ps -ef|grep java
>> root      3863  3855 30 11:47 pts/1    00:00:01 /usr/java/default/bin/java -Djava.library.path=/common/sdm/lib/lx-amd64 -Djava.endorsed.dirs=/common/sdm/lib/ext/endorsed -Dcom.sun.grid.grm.management.connectionTimeout=20 -Djava.security.manager=java.rmi.RMISecurityManager -Djava.security.policy=/common/sdm/util/sdmadm.policy -jar /common/sdm/lib/sdm-starter.jar com.sun.grid.grm.cli.SdmAdm suj
>> hedeby    3927  3926 30 11:47 ?        00:00:01 /usr/java/jre1.6.0_18/bin/java -Djava.security.manager=java.rmi.RMISecurityManager -Djava.security.policy==/var/spool/sdm/awssge/security/java.policy -Djava.security.auth.login.config=/var/spool/sdm/awssge/security/jaas.config -Dcom.sun.grid.grm.bootstrap.systemname=awssge -Dcom.sun.grid.grm.bootstrap.jvmname=cs_vm -Dcom.sun.grid.grm.bootstrap.localspool=/var/spool/sdm/awssge -Dcom.sun.grid.grm.bootstrap.dist=/common/sdm -Dcom.sun.grid.grm.bootstrap.csInfo=awssge.cbio.mskcc.org:6446 -Dcom.sun.grid.grm.bootstrap.preferencesType=SYSTEM -Djava.util.logging.manager=com.sun.grid.grm.util.GrmLogManager -Djava.library.path=/common/sdm/lib/lx-amd64::/common/sdm/lib/lx-amd64: -Dcom.sun.grid.grm.bootstrap.isCS=true -cp /common/sdm/lib/sdm-security.jar:/common/sdm/lib/sdm-starter.jar:/common/sdm/lib/sdm-cloud-adapter.jar:/common/sdm/lib/sdm-upgrade.jar:/common/sdm/lib/sdm-common.jar:/common/sdm/lib/sdm-ge-adapter.jar:/common/sdm/lib/ext
/
> jaxb-impl.jar:/common/sdm/lib/ext/activation.jar:/common/sdm/lib/ext/jsr173_1.0_api.jar -Djava.rmi.server.codebase=file:/common/sdm/lib/sdm-security.jar file:/common/sdm/lib/sdm-starter.jar file:/common/sdm/lib/sdm-cloud-adapter.jar file:/common/sdm/lib/sdm-upgrade.jar file:/common/sdm/lib/sdm-common.jar file:/common/sdm/lib/sdm-ge-adapter.jar file:/common/sdm/lib/ext/jaxb-impl.jar file:/common/sdm/lib/ext/activation.jar file:/common/sdm/lib/ext/jsr173_1.0_api.jar -Djava.endorsed.dirs=/common/sdm/lib/ext/endorsed -Djava.rmi.server.hostname=awssge.cbio.mskcc.org -Xmx128M -Dcom.sun.grid.grm.management.connectionTimeout=60 com.sun.grid.grm.bootstrap.JVMImpl
>> root      3993  2439  0 11:47 pts/0    00:00:00 grep java
> 
>     But sdmadm cannot see them:
> 
>> [root at awssge ~]# ~hedeby/bin/sdmadm -s awssge sj
>> Error: Cannot connect to JVM cs_vm at awssge_cbio_mskcc_org: Failed to retrieve RMIServer stub: javax.naming.NameNotFoundException: awssge
>> [root at awssge ~]# ~hedeby/bin/sdmadm -s awssge sc
>> Error: Cannot connect to JVM cs_vm at awssge_cbio_mskcc_org: Failed to retrieve RMIServer stub: javax.naming.NameNotFoundException: awssge
>> [root at awssge ~]# ~hedeby/bin/sdmadm -s awssge sbc
>> system type   host                  port properties
>> ---------------------------------------------------
>> awssge SYSTEM awssge.cbio.mskcc.org 6446           [root at awssge ~]# 
> 
>     What am I doing wrong?
> 
> Thanks,
> 
> Chris Pepper 


-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Richard Hierlmeier           Phone: ++49 (0)941 3075-223
Software Engineering         Fax:   ++49 (0)941 3075-222
Sun Microsystems GmbH
Dr.-Leo-Ritter-Str. 7	     mailto: richard.hierlmeier at sun.com
D-93049 Regensburg           http://www.sun.com/grid

Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht München: HRB 161028
Geschäftsführer: Thomas Schröder, Wolfgang Engels
Vorsitzender des Aufsichtsrates: Martin Häring

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248722

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list