[GE users] cloud adapter got into an hostname resolution collision issue

cbyun cbyun at ll.mit.edu
Thu Aug 13 15:51:17 BST 2009


I was modifying the cloud adapter configuration and reloaded it.
Then, when I submit some jobs, it started more machines and reported the following events.

Some of resources are not being used by any service although they are running JVM's.

How can I clear the current issue?

08/13/2009 10:44:24|91|pl.cloud.Ec2CloudResponseParser.getSingleInfo|W|Service power:Unexpected amount (2) of items, using com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl at 691d2d94.
08/13/2009 10:44:24|91|pl.cloud.Ec2CloudResponseParser.getSingleInfo|W|Service power:Unexpected amount (2) of items, using com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl at 7a9b8575.
08/13/2009 10:44:24|91|pl.cloud.Ec2CloudResponseParser.getSingleInfo|W|Service power:Unexpected amount (2) of items, using com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl at 164d1de0.
08/13/2009 10:44:24|91|pl.cloud.Ec2CloudResponseParser.getSingleInfo|W|Service power:Unexpected amount (2) of items, using com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl at 20e46890.
08/13/2009 10:44:24|91|pl.cloud.Ec2CloudResponseParser.getSingleInfo|W|Service power:Unexpected amount (2) of items, using com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl at 738cd6ce.
08/13/2009 10:44:24|91|pl.cloud.Ec2CloudResponseParser.getSingleInfo|W|Service power:Unexpected amount (2) of items, using com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl at 236aa9a6.
08/13/2009 10:44:24|91|pl.cloud.Ec2CloudResponseParser.getSingleInfo|W|Service power:Unexpected amount (2) of items, using com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl at 2fa7e374.
08/13/2009 10:44:24|91|pl.cloud.Ec2CloudResponseParser.getSingleInfo|W|Service power:Unexpected amount (2) of items, using com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl at 3745ed1d.
08/13/2009 10:44:24|91|2CloudResponseParser.parseXMLForCloudHostInfo|E|Service power:Hostname resoving collision! InstanceIdList no longer corresponds to Set<CloudHostInfo>\n instanceIdList=[i-blade-0-0.local, i-blade-0-9.local, i-blade-0-1.local, i-blade-0-2.local, i-blade-0-3.local, i-blade-0-4.local, i-blade-0-5.local, i-blade-0-6.local, i-blade-0-8.local, i-blade-0-9.local], info=[hostname: llgriddev.local, instanceId: i-blade-0-9.local, launchTime: null] , set=[[hostname: blade-0-8.local, instanceId: i-blade-0-8.local, launchTime: 2009-08-13T10:44:24.000Z] , [hostname: blade-0-6.local, instanceId: i-blade-0-6.local, launchTime: 2009-08-13T10:44:24.000Z] , [hostname: blade-0-5.local, instanceId: i-blade-0-5.local, launchTime: 2009-08-13T10:44:24.000Z] , [hostname: blade-0-4.local, instanceId: i-blade-0-4.local, launchTime: 2009-08-13T10:44:24.000Z] , [hostname: blade-0-3.local, instanceId: i-blade-0-3.local, launchTime: 2009-08-13T10:44:24.000Z] , [hostname: blade-0-2.local, instanceId: i-blade-0-2.local, launchTime: 2009-08-13T10:44:24.000Z] , [hostname: blade-0-1.local, instanceId: i-blade-0-1.local, launchTime: 2009-08-13T10:44:24.000Z] , [hostname: llgriddev.local, instanceId: i-blade-0-9.local, launchTime: null] , [hostname: blade-0-0.local, instanceId: i-blade-0-0.local, launchTime: 2009-08-13T10:44:24.000Z] ]
08/13/2009 10:44:24|91|ud.executor.TaskQueueExecutor$Task$Worker.run|E|STask[638].worker[SEQ,0]: failed com.sun.grid.grm.service.impl.cloud.CloudServiceAdapterImpl$3 at 4fd22744 Caused bynull


Currently all nodes are running jvm's but some of them are not appeared in any service:

[root at llgriddev power_drac]# sdmadm sj
name  host            state      used_mem  max_mem   message
---------------------------------------------------------------------------------------------
cs_vm blade-0-0.local STARTED           4M       28M
      blade-0-1.local STARTED           4M       28M
      blade-0-2.local STARTED           6M       28M
      blade-0-3.local STARTED           9M       28M
      blade-0-4.local STARTED           6M       28M
      blade-0-5.local STARTED           8M       28M
      blade-0-6.local STARTED           8M       28M
      blade-0-8.local STARTED           6M       28M
      blade-0-9.local STARTED           6M       28M
      llgriddev.local STARTED          41M      568M
[root at llgriddev power_drac]# sdmadm sr
service id              state    type flags usage annotation
------------------------------------------------------------------------
gesvc2  blade-0-0.local ASSIGNED host       60    Got execd update event
        blade-0-1.local ASSIGNED host       60    Got execd update event
        blade-0-8.local ASSIGNED host       60    Got execd update event
        blade-0-9.local ASSIGNED host       60    Got execd update event


Thanks,
- Chansup

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=212143

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list