[Hedeby users] Re: [GE users] SDM issues

cbyun cbyun at ll.mit.edu
Thu Jul 23 14:13:39 BST 2009


Hi Richard and Torsten,

Yes, I'm interested in the power saving use case using the cloud service.
I'll follow your suggestions and let you know how it goes.

Please see more questions/comments below.

> -----Original Message-----
> From: Torsten.Blix at sun.com [mailto:Torsten.Blix at sun.com]
> Sent: Thursday, July 23, 2009 4:26 AM
> To: users at gridengine.sunsource.net
> Cc: users
> Subject: [Hedeby users] Re: [GE users] SDM issues
>
> Hi Chansup,
>
> we are assuming that you trying out the power saving use case, is that
> correct? For this use case the VPN configuration is irrelevant.
>
> On 07/22/09 19:46, cbyun wrote:
> > Issue 2: How to resolve the VPN server issue?
> >
> > I used the following config for cloud service:
> >
> >     <cloud_adapter:vpn xsi:type="cloud_adapter:OpenVPNConfig"
> >                        vpnBinDir="/usr/sbin"
> >                        vpnConfigDir="/tmp"
> >                        vpnRequired="true"/>
> >


If VPN is irrelevant, should I use "false" for vpnRequired parameter?



> > Which is taken from the Richard's blog:
> http://blogs.sun.com/rhierlmeier/entry/using_sdm_cloud_adapter_to
> >
> > What does the following actually mean?
> > Service power:Problem: VPN server is corrupted! Registered but server-
> less resources:
>
> The cloud adapter is designed for usage with a VPN. The power saving
> feature in a local network does not need the VPN, however the vpn
> configuration parameters are mandatory anyway. The logic connected with
> handling the VPN server is also used for the power saving use case. This
> logic says that if the "VPN server" is not reachable, all other "cloud"
> hosts are not reachable. "VPN server" here simply stands for a specific
> host (the first discovered resource), no VPN is actually running.
>
> The cloud adapter regularly observes the state of all resources. For
> this purpose it calls the showCloudHostsScript. This script returns a
> list of all resources and their state.

I manually executed the showCloudHostsScript and got the following xml file.
Is there anything wrong?  Is the dot(.) in the hostname be a problem?

<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <DescribeInstancesResponse xmlns="http://ec2.amazonaws.com/doc/2008-08-08/">
      <requestId>host_request_id</requestId>
      <reservationSet>
        <item>
          <reservationId>r-host-7e99a8e938</reservationId>
          <ownerId>391035046281</ownerId>
          <groupSet>
            <item>
              <groupId></groupId>
            </item>
          </groupSet>
          <instancesSet>
            <item>
              <instanceId>i-blade-0-0.local</instanceId>
              <imageId>blade-0-0.local</imageId>
              <instanceState>
                <code>16</code>
                <name>running</name>
              </instanceState>
              <privateDnsName>blade-0-0.local</privateDnsName>
              <dnsName>2009-07-23T08:45:29.000Z</dnsName>
              <reason/>
              <keyName>mytest-keypair</keyName>
              <amiLaunchIndex>0</amiLaunchIndex>
              <productCodes/>
              <instanceType>m1.small</instanceType>
              <launchTime></launchTime>
              <placement>
                <availabilityZone>us-east-1b</availabilityZone>
              </placement>
              <kernelId>aki-6552b60c</kernelId>
              <ramdiskId>ari-6452b60d</ramdiskId>
            </item>
            <item>
              <instanceId>i-blade-0-1.local</instanceId>
              <imageId>blade-0-1.local</imageId>
              <instanceState>
                <code>16</code>
                <name>running</name>
              </instanceState>
              <privateDnsName>blade-0-1.local</privateDnsName>
              <dnsName>2009-07-23T08:45:29.000Z</dnsName>
              <reason/>
              <keyName>mytest-keypair</keyName>
              <amiLaunchIndex>0</amiLaunchIndex>
              <productCodes/>
              <instanceType>m1.small</instanceType>
              <launchTime></launchTime>
              <placement>
                <availabilityZone>us-east-1b</availabilityZone>
              </placement>
              <kernelId>aki-6552b60c</kernelId>
              <ramdiskId>ari-6452b60d</ramdiskId>
            </item>
            <item>
              <instanceId>i-blade-0-2.local</instanceId>
              <imageId>blade-0-2.local</imageId>
              <instanceState>
                <code>16</code>
                <name>running</name>
              </instanceState>
              <privateDnsName>blade-0-2.local</privateDnsName>
              <dnsName>2009-07-23T08:45:29.000Z</dnsName>
              <reason/>
              <keyName>mytest-keypair</keyName>
              <amiLaunchIndex>0</amiLaunchIndex>
              <productCodes/>
              <instanceType>m1.small</instanceType>
              <launchTime></launchTime>
              <placement>
                <availabilityZone>us-east-1b</availabilityZone>
              </placement>
              <kernelId>aki-6552b60c</kernelId>
              <ramdiskId>ari-6452b60d</ramdiskId>
            </item>
            <item>
              <instanceId>i-blade-0-3.local</instanceId>
              <imageId>blade-0-3.local</imageId>
              <instanceState>
                <code>16</code>
                <name>running</name>
              </instanceState>
              <privateDnsName>blade-0-3.local</privateDnsName>
              <dnsName>2009-07-23T08:45:29.000Z</dnsName>
              <reason/>
              <keyName>mytest-keypair</keyName>
              <amiLaunchIndex>0</amiLaunchIndex>
              <productCodes/>
              <instanceType>m1.small</instanceType>
              <launchTime></launchTime>
              <placement>
                <availabilityZone>us-east-1b</availabilityZone>
              </placement>
              <kernelId>aki-6552b60c</kernelId>
              <ramdiskId>ari-6452b60d</ramdiskId>
            </item>
            <item>
              <instanceId>i-blade-0-4.local</instanceId>
              <imageId>blade-0-4.local</imageId>
              <instanceState>
                <code>16</code>
                <name>running</name>
              </instanceState>
              <privateDnsName>blade-0-4.local</privateDnsName>
              <dnsName>2009-07-23T08:45:29.000Z</dnsName>
              <reason/>
              <keyName>mytest-keypair</keyName>
              <amiLaunchIndex>0</amiLaunchIndex>
              <productCodes/>
              <instanceType>m1.small</instanceType>
              <launchTime></launchTime>
              <placement>
                <availabilityZone>us-east-1b</availabilityZone>
              </placement>
              <kernelId>aki-6552b60c</kernelId>
              <ramdiskId>ari-6452b60d</ramdiskId>
            </item>
            <item>
              <instanceId>i-blade-0-5.local</instanceId>
              <imageId>blade-0-5.local</imageId>
              <instanceState>
                <code>16</code>
                <name>running</name>
              </instanceState>
              <privateDnsName>blade-0-5.local</privateDnsName>
              <dnsName>2009-07-23T08:45:29.000Z</dnsName>
              <reason/>
              <keyName>mytest-keypair</keyName>
              <amiLaunchIndex>0</amiLaunchIndex>
              <productCodes/>
              <instanceType>m1.small</instanceType>
              <launchTime></launchTime>
              <placement>
                <availabilityZone>us-east-1b</availabilityZone>
              </placement>
              <kernelId>aki-6552b60c</kernelId>
              <ramdiskId>ari-6452b60d</ramdiskId>
            </item>
            <item>
              <instanceId>i-blade-0-6.local</instanceId>
              <imageId>blade-0-6.local</imageId>
              <instanceState>
                <code>16</code>
                <name>running</name>
              </instanceState>
              <privateDnsName>blade-0-6.local</privateDnsName>
              <dnsName>2009-07-23T08:45:29.000Z</dnsName>
              <reason/>
              <keyName>mytest-keypair</keyName>
              <amiLaunchIndex>0</amiLaunchIndex>
              <productCodes/>
              <instanceType>m1.small</instanceType>
              <launchTime></launchTime>
              <placement>
                <availabilityZone>us-east-1b</availabilityZone>
              </placement>
              <kernelId>aki-6552b60c</kernelId>
              <ramdiskId>ari-6452b60d</ramdiskId>
            </item>
            <item>
              <instanceId>i-blade-0-7.local</instanceId>
              <imageId>blade-0-7.local</imageId>
              <instanceState>
                <code>16</code>
                <name>running</name>
              </instanceState>
              <privateDnsName>blade-0-7.local</privateDnsName>
              <dnsName>2009-07-23T08:45:29.000Z</dnsName>
              <reason/>
              <keyName>mytest-keypair</keyName>
              <amiLaunchIndex>0</amiLaunchIndex>
              <productCodes/>
              <instanceType>m1.small</instanceType>
              <launchTime></launchTime>
              <placement>
                <availabilityZone>us-east-1b</availabilityZone>
              </placement>
              <kernelId>aki-6552b60c</kernelId>
              <ramdiskId>ari-6452b60d</ramdiskId>
            </item>
            <item>
              <instanceId>i-blade-0-8.local</instanceId>
              <imageId>blade-0-8.local</imageId>
              <instanceState>
                <code>16</code>
                <name>running</name>
              </instanceState>
              <privateDnsName>blade-0-8.local</privateDnsName>
              <dnsName>2009-07-23T08:45:29.000Z</dnsName>
              <reason/>
              <keyName>mytest-keypair</keyName>
              <amiLaunchIndex>0</amiLaunchIndex>
              <productCodes/>
              <instanceType>m1.small</instanceType>
              <launchTime></launchTime>
              <placement>
                <availabilityZone>us-east-1b</availabilityZone>
              </placement>
              <kernelId>aki-6552b60c</kernelId>
              <ramdiskId>ari-6452b60d</ramdiskId>
            </item>
            <item>
              <instanceId>i-blade-0-9.local</instanceId>
              <imageId>blade-0-9.local</imageId>
              <instanceState>
                <code>16</code>
                <name>running</name>
              </instanceState>
              <privateDnsName>blade-0-9.local</privateDnsName>
              <dnsName>2009-07-23T08:45:29.000Z</dnsName>
              <reason/>
              <keyName>mytest-keypair</keyName>
              <amiLaunchIndex>0</amiLaunchIndex>
              <productCodes/>
              <instanceType>m1.small</instanceType>
              <launchTime></launchTime>
              <placement>
                <availabilityZone>us-east-1b</availabilityZone>
              </placement>
              <kernelId>aki-6552b60c</kernelId>
              <ramdiskId>ari-6452b60d</ramdiskId>
            </item>
          </instancesSet>
        </item>
      </reservationSet>
    </DescribeInstancesResponse>
  </soap:Body>
</soap:Envelope>


Thanks,
- Chansup



>
> > 07/22/2009
> 13:37:22|48|vice.impl.cloud.CloudSnapshot.checkCloudState|W|Service
> power:Problem: VPN server is corrupted! Registered but server-less
> resources: [[hostname: blade-0-0.local, instanceId: i-blade-0-0,
> launchTime: 2009-07-21T09:56:03.000Z] , [hostname: blade-0-1.local,
> instanceId: i-blade-0-1, launchTime: 2009-07-21T09:56:03.000Z] ,
> [hostname: blade-0-2.local, instanceId: i-blade-0-2, launchTime: 2009-07-
> 21T09:56:03.000Z] , [hostname: blade-0-3.local, instanceId: i-blade-0-3,
> launchTime: 2009-07-21T09:56:03.000Z] , [hostname: blade-0-4.local,
> instanceId: i-blade-0-4, launchTime: 2009-07-21T09:56:03.000Z] ,
> [hostname: blade-0-5.local, instanceId: i-blade-0-5, launchTime: 2009-07-
> 21T09:56:03.000Z] , [hostname: blade-0-6.local, instanceId: i-blade-0-6,
> launchTime: 2009-07-21T09:56:03.000Z] , [hostname: blade-0-7.local,
> instanceId: i-blade-0-7, launchTime: 2009-07-21T09:56:03.000Z] ,
> [hostname: blade-0-8.local, instanceId: i-blade-0-8, launchTime: 2009-07-
> 21T09:56:03.0
>
> 00Z] , [hostname: blade-0-9.local, instanceId: i-blade-0-9, launchTime:
> 2009-07-21T09:56:03.000Z] ].
> > 07/22/2009
> 13:37:22|48|ice.impl.cloud.CloudResourceAmountOptTask.run|I|Service
> power:Service is in error recovery mode. Skipping resource amount
> optimization cylce.
>
> The above error message indicates that the showCloudHostsScript somehow
> reported an error for the "VPN server". The cloud adapter therefore
> assumes that all cloud resources cannot be used (error recovery mode).
>
> > # sdmadm sr
> > service    id              state    type flags usage annotation
> > ------------------------------------------------------------------------
> ---------------------
> > gesvc2     blade-0-1.local ASSIGNED host SA    1     Got execd update
> event
> >            blade-0-2.local ASSIGNED host SA    1     Got execd update
> event
> >            blade-0-3.local ASSIGNED host SA    1     Got execd update
> event
> >            blade-0-4.local ASSIGNED host SA    1     Got execd update
> event
> >            blade-0-5.local ASSIGNED host SA    1     Got execd update
> event
> >            blade-0-6.local ASSIGNED host SA    1     Got execd update
> event
> >            blade-0-8.local ASSIGNED host SA    1     Got execd update
> event
> >            blade-0-9.local ASSIGNED host SA    1     Got execd update
> event
> > power      blade-0-0.local ASSIGNED host A     2     Resource is used by
> two or more services
> >            blade-0-1.local ASSIGNED host A     2     Resource is used by
> two or more services
> >            blade-0-2.local ASSIGNED host A     2     Resource is used by
> two or more services
> >            blade-0-3.local ASSIGNED host A     2     Resource is used by
> two or more services
> >            blade-0-4.local ASSIGNED host A     2     Resource is used by
> two or more services
> >            blade-0-5.local ASSIGNED host A     2     Resource is used by
> two or more services
> >            blade-0-6.local ASSIGNED host A     2     Resource is used by
> two or more services
> >            blade-0-7.local ASSIGNED host A     2     Resource is used by
> two or more services
> >            blade-0-8.local ASSIGNED host A     2     Resource is used by
> two or more services
> >            blade-0-9.local ASSIGNED host A     2     Resource is used by
> two or more services
> > spare_pool blade-0-0.local ASSIGNED host A     1     Resource is used by
> two or more services
> >            blade-0-7.local ASSIGNED host A     1     Resource is used by
> two or more services
>
> This output indicates some further problems. All resources in GE service
> are static. This indicates (unless you set this by hand) that the SDM
> executor component is not running on these resources. GE adapter will
> not be able to uninstall these hosts, they are treated as static (flag
> S). It seems that the GE adapter has autodiscovered them. Does the execd
> start automatically at boot time? For the power saving use case they
> must not.
>
> However, we are assuming that the SDM executor on the power saving hosts
> IS installed in such a way that the SDM executor is started
> automatically at boot time or that there exists a post-startup-hook
> script that starts the SDM executor. The sample implementation of the
> power saving scripts does not describe this fact.
>
> In your case, both the power saving service and the GE service
> autodiscovered the same resources. If a resource is owned by more than
> one service the resource provider treats the resource as ambiguous (flag
> A).
>
> The recommended way to set up the power saving service is to not have
> any resources in the system when installing the power saving service.
>
> To solve your problem do the following steps:
>
> 1. Please ensure that on all hosts the managed host installations has
> been executed (best with the -autostart flag set).
>
> 2. Start up the SDM executors on all managed hosts. This should remove
> the static flag from the resources in the GE service.
>
> 3. Remove the (still ambiguous) resources from the GE service with
>      sdmadm rr -r <resource_name> -s gesvc2
>     This should remove the ambiguous status from the remaining resource
> of this name.
>
> 4. Implement a post-startup-hook for the power saving scripts:
>     a) if the executor is started at boot time, the post-startup-hook
> should block until the executor is up and running.
>     b) otherwise the post-startup-hook must startup the SDM executor and
> wait until it is up and running.
>
> Cheers,
> Richard and Torsten
>
> ------------------------------------------------------
> http://hedeby.sunsource.net/ds/viewMessage.do?dsForumId=160&dsMessageId=20
> 9142
>
> To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at hedeby.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=209194

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list