[GE users] Re: [GE users] hedeby communication problem: sdmadm on master cannot find itself]

rhierlmeier richard.hierlmeier at sun.com
Mon Mar 15 14:40:57 GMT 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi Chris

it seems that you mixed up your ports. I bet that 6446 is the port of the JMX 
server of your Grid Engine qmaster.

SDM needs it's own port. Please retry the SDM master installation with an unique 
port (option -cs_port).

You have to throw away the existing system. The fastest way is deleting the 
directories

     /etc/sdm/bootstrap/awssge
and /var/sdm/awssge

Richard


On 03/15/10 15:13, reppep wrote:
> Richard,
> 
> 	awssge is the hedeby master host, and it can resolve itself.
> 
> Thanks,
> 
> Chris
> 
>> [root at awssge ~]# time ~hedeby/bin/sdmadm -s awssge sj
>> Error: Cannot connect to JVM cs_vm at awssge_cbio_mskcc_org: Failed to retrieve RMI
>>
>> real	0m0.553s
>> user	0m0.802s
>> sys	0m0.067s
>> [root at awssge ~]# time ~hedeby/bin/sdmadm -s awssge suj
>> jvm         host                  result message                       
>> -----------------------------------------------------------------------
>> cs_vm       awssge.cbio.mskcc.org ERROR  JVM: cs_vm died during startup.
>> executor_vm awssge.cbio.mskcc.org ERROR  Timeout. Pid file: /var/spool/sdm/awssg
>> rp_vm       awssge.cbio.mskcc.org ERROR  Timeout. Pid file: /var/spool/sdm/awssg
>> Error: Command has generated error.
>>
>> real	2m4.736s
>> user	0m3.755s
>> sys	0m1.073s
>> [root at awssge ~]# ps -ef | grep java
>> hedeby    3927     1  0 Mar08 ?        00:00:01 /usr/java/jre1.6.0_18/bin/java -Djava.security.manager=java.rmi.RMISecurityManager -Djava.security.policy==/var/spool/sdm/awssge/security/java.policy -Djava.security.auth.login.config=/var/spool/sdm/awssge/security/jaas.config -Dcom.sun.grid.grm.bootstrap.systemname=a
>> root     23610 22595  0 10:10 pts/0    00:00:00 grep java
>> [root at awssge ~]# !cat
>> cat /var/spool/sdm/awssge/log/cs_vm-0.log 
>> 03/08/2010 11:47:38|10|m.bootstrap.JVMImpl$PrivilegedStartAction.run|I|startup jvm (pid=3927)
>> 03/08/2010 11:47:39|11|.grm.bootstrap.JVMImpl$ComponentLifecycle.run|W|Error in lifecycle of component cs_vm: Cannot start component cs_vm: Can not create MBeanServer at port 6,446: Port already in use: 6446; nested exception is: 
>>                                                                       |	java.net.BindException: Address already in use
>> 03/08/2010 11:47:39|12|rid.grm.bootstrap.JVMImpl$ShutdownHandler.run|I|Got shutdown event
>> [root at awssge ~]# ping -c2 awssge
>> PING awssge.cbio.mskcc.org (140.163.254.41) 56(84) bytes of data.
>> 64 bytes from awssge.cbio.mskcc.org (140.163.254.41): icmp_seq=1 ttl=64 time=0.037 ms
>> 64 bytes from awssge.cbio.mskcc.org (140.163.254.41): icmp_seq=2 ttl=64 time=0.011 ms
>>
>> --- awssge.cbio.mskcc.org ping statistics ---
>> 2 packets transmitted, 2 received, 0% packet loss, time 999ms
>> rtt min/avg/max/mdev = 0.011/0.024/0.037/0.013 ms
> 
> 
> rhierlmeier wrote:
>> Hi Chris,
>>
>> What was the output of the sdmadm suj command? How long did it run?
>>
>> Can you send me the output of the log file of the JVM. It is stored in 
>> <local_spool_dir>/log/cs_vm-0.log
>>
>> In your scenarion the <local_spool_dir> is /var/sdm/awssge
>>
>> Please check also that the hostname awssge can be correctly resolved on the 
>> hedeby master host.
>>
>>
>>
>> Richard
>>
>>
>>> I am having trouble setting up a test Hedeby installation. sdmadm cannot communicate with the Java processes on the local system.
>>>
>>>     I installed SGE with JMX & cluster name 'awssge', then followed <http://wiki.gridengine.info/wiki/index.php/SGE-Hedeby-And-Amazon-EC2#HowTo:_Setup_the_Grid_Engine_6.2_Master> with 'hedeby1' as the SDM_SYSTEM name. "sdmadm suj" did *start* the Java processes, but was unable to report on their health.
>>>
>>>     I tried again, following <http://wikis.sun.com/display/gridengine62u3/SDM+Installation+Overview> with 'awssge' as the SDM_MASTER name to match the cluster name, but have the same problems.
>>>
>>>
>>>     My SDM installation command was:
>>>
>>>> ~hedeby/bin/sdmadm -s awssge -p system install_master_host -ca_admin_mail '****' -ca_org "Memorial Sloan-Kettering Cancer Center" -ca_org_unit "Computational Biology" -ca_country US -au hedeby -sge_root /common/sge/ -ca_location "New York City" -cs_port 6446 -ca_state "New York"
>>>     Its output (aside from the license text) was:
>>>
>>>> Do you agree with the terms of the license ? (Y/N)y
>>>> The License has been accepted by the user.
>>>> Install master host command is using default local spool dir: /var/spool/sdm/awssge
>>>> A configuration for system "awssge" has been added.
>>>     The processes started by 'sdmadm suj' are:
>>>
>>>> [root at awssge ~]# ps -ef|grep java
>>>> root      3863  3855 30 11:47 pts/1    00:00:01 /usr/java/default/bin/java -Djava.library.path=/common/sdm/lib/lx-amd64 -Djava.endorsed.dirs=/common/sdm/lib/ext/endorsed -Dcom.sun.grid.grm.management.connectionTimeout=20 -Djava.security.manager=java.rmi.RMISecurityManager -Djava.security.policy=/common/sdm/util/sdmadm.policy -jar /common/sdm/lib/sdm-starter.jar com.sun.grid.grm.cli.SdmAdm suj
>>>> hedeby    3927  3926 30 11:47 ?        00:00:01 /usr/java/jre1.6.0_18/bin/java -Djava.security.manager=java.rmi.RMISecurityManager -Djava.security.policy==/var/spool/sdm/awssge/security/java.policy -Djava.security.auth.login.config=/var/spool/sdm/awssge/security/jaas.config -Dcom.sun.grid.grm.bootstrap.systemname=awssge -Dcom.sun.grid.grm.bootstrap.jvmname=cs_vm -Dcom.sun.grid.grm.bootstrap.localspool=/var/spool/sdm/awssge -Dcom.sun.grid.grm.bootstrap.dist=/common/sdm -Dcom.sun.grid.grm.bootstrap.csInfo=awssge.cbio.mskcc.org:6446 -Dcom.sun.grid.grm.bootstrap.preferencesType=SYSTEM -Djava.util.logging.manager=com.sun.grid.grm.util.GrmLogManager -Djava.library.path=/common/sdm/lib/lx-amd64::/common/sdm/lib/lx-amd64: -Dcom.sun.grid.grm.bootstrap.isCS=true -cp /common/sdm/lib/sdm-security.jar:/common/sdm/lib/sdm-starter.jar:/common/sdm/lib/sdm-cloud-adapter.jar:/common/sdm/lib/sdm-upgrade.jar:/common/sdm/lib/sdm-common.jar:/common/sdm/lib/sdm-ge-adapter.jar:/common/sdm/lib/e
x
> t
>> /
>>> jaxb-impl.jar:/common/sdm/lib/ext/activation.jar:/common/sdm/lib/ext/jsr173_1.0_api.jar -Djava.rmi.server.codebase=file:/common/sdm/lib/sdm-security.jar file:/common/sdm/lib/sdm-starter.jar file:/common/sdm/lib/sdm-cloud-adapter.jar file:/common/sdm/lib/sdm-upgrade.jar file:/common/sdm/lib/sdm-common.jar file:/common/sdm/lib/sdm-ge-adapter.jar file:/common/sdm/lib/ext/jaxb-impl.jar file:/common/sdm/lib/ext/activation.jar file:/common/sdm/lib/ext/jsr173_1.0_api.jar -Djava.endorsed.dirs=/common/sdm/lib/ext/endorsed -Djava.rmi.server.hostname=awssge.cbio.mskcc.org -Xmx128M -Dcom.sun.grid.grm.management.connectionTimeout=60 com.sun.grid.grm.bootstrap.JVMImpl
>>>> root      3993  2439  0 11:47 pts/0    00:00:00 grep java
>>>     But sdmadm cannot see them:
>>>
>>>> [root at awssge ~]# ~hedeby/bin/sdmadm -s awssge sj
>>>> Error: Cannot connect to JVM cs_vm at awssge_cbio_mskcc_org: Failed to retrieve RMIServer stub: javax.naming.NameNotFoundException: awssge
>>>> [root at awssge ~]# ~hedeby/bin/sdmadm -s awssge sc
>>>> Error: Cannot connect to JVM cs_vm at awssge_cbio_mskcc_org: Failed to retrieve RMIServer stub: javax.naming.NameNotFoundException: awssge
>>>> [root at awssge ~]# ~hedeby/bin/sdmadm -s awssge sbc
>>>> system type   host                  port properties
>>>> ---------------------------------------------------
>>>> awssge SYSTEM awssge.cbio.mskcc.org 6446           [root at awssge ~]# 
>>>     What am I doing wrong?
>>>
>>> Thanks,
>>>
>>> Chris Pepper 
>>
> 
> 


-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Richard Hierlmeier           Phone: ++49 (0)941 3075-223
Software Engineering         Fax:   ++49 (0)941 3075-222
Sun Microsystems GmbH
Dr.-Leo-Ritter-Str. 7	     mailto: richard.hierlmeier at sun.com
D-93049 Regensburg           http://www.sun.com/grid

Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht München: HRB 161028
Geschäftsführer: Thomas Schröder, Wolfgang Engels
Vorsitzender des Aufsichtsrates: Martin Häring

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248746

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list