[GE users] SDM issues

cbyun cbyun at ll.mit.edu
Wed Jul 22 18:46:01 BST 2009


I have set up grid engine and cloud services using the SGE 6.2u3 release. 
Now I have a GE service in unknown state and I don't know how to delete it.

Also, my cloud service is running but I see some issues from the log file.


Issue 1: How to delete the service in unknown state but started?

# sdmadm ss
host            service    cstate  sstate
------------------------------------------
llgriddev.local gesvc      STARTED UNKNOWN
                gesvc2     STARTED RUNNING
                power      STARTED RUNNING
                spare_pool STARTED RUNNING

# sdmadm sds -s gesvc -fr
service result message
-----------------------------------------------------
gesvc   ERROR  Can not stop service, it is not active
Error: Command has generated error.

# sdmadm remove_service -s gesvc
Error: Operation on component cannot be performed. Component in illegal state: STARTED

Issue 2: How to resolve the VPN server issue?

I used the following config for cloud service:

    <cloud_adapter:vpn xsi:type="cloud_adapter:OpenVPNConfig"
                       vpnBinDir="/usr/sbin"
                       vpnConfigDir="/tmp"
                       vpnRequired="true"/>

Which is taken from the Richard's blog: http://blogs.sun.com/rhierlmeier/entry/using_sdm_cloud_adapter_to

What does the following actually mean?
Service power:Problem: VPN server is corrupted! Registered but server-less resources:


07/22/2009 13:37:22|48|vice.impl.cloud.CloudSnapshot.checkCloudState|W|Service power:Problem: VPN server is corrupted! Registered but server-less resources: [[hostname: blade-0-0.local, instanceId: i-blade-0-0, launchTime: 2009-07-21T09:56:03.000Z] , [hostname: blade-0-1.local, instanceId: i-blade-0-1, launchTime: 2009-07-21T09:56:03.000Z] , [hostname: blade-0-2.local, instanceId: i-blade-0-2, launchTime: 2009-07-21T09:56:03.000Z] , [hostname: blade-0-3.local, instanceId: i-blade-0-3, launchTime: 2009-07-21T09:56:03.000Z] , [hostname: blade-0-4.local, instanceId: i-blade-0-4, launchTime: 2009-07-21T09:56:03.000Z] , [hostname: blade-0-5.local, instanceId: i-blade-0-5, launchTime: 2009-07-21T09:56:03.000Z] , [hostname: blade-0-6.local, instanceId: i-blade-0-6, launchTime: 2009-07-21T09:56:03.000Z] , [hostname: blade-0-7.local, instanceId: i-blade-0-7, launchTime: 2009-07-21T09:56:03.000Z] , [hostname: blade-0-8.local, instanceId: i-blade-0-8, launchTime: 2009-07-21T09:56:03.000Z] , [hostname: blade-0-9.local, instanceId: i-blade-0-9, launchTime: 2009-07-21T09:56:03.000Z] ].
07/22/2009 13:37:22|48|ice.impl.cloud.CloudResourceAmountOptTask.run|I|Service power:Service is in error recovery mode. Skipping resource amount optimization cylce.


# sdmadm sr
service    id              state    type flags usage annotation
---------------------------------------------------------------------------------------------
gesvc2     blade-0-1.local ASSIGNED host SA    1     Got execd update event
           blade-0-2.local ASSIGNED host SA    1     Got execd update event
           blade-0-3.local ASSIGNED host SA    1     Got execd update event
           blade-0-4.local ASSIGNED host SA    1     Got execd update event
           blade-0-5.local ASSIGNED host SA    1     Got execd update event
           blade-0-6.local ASSIGNED host SA    1     Got execd update event
           blade-0-8.local ASSIGNED host SA    1     Got execd update event
           blade-0-9.local ASSIGNED host SA    1     Got execd update event
power      blade-0-0.local ASSIGNED host A     2     Resource is used by two or more services
           blade-0-1.local ASSIGNED host A     2     Resource is used by two or more services
           blade-0-2.local ASSIGNED host A     2     Resource is used by two or more services
           blade-0-3.local ASSIGNED host A     2     Resource is used by two or more services
           blade-0-4.local ASSIGNED host A     2     Resource is used by two or more services
           blade-0-5.local ASSIGNED host A     2     Resource is used by two or more services
           blade-0-6.local ASSIGNED host A     2     Resource is used by two or more services
           blade-0-7.local ASSIGNED host A     2     Resource is used by two or more services
           blade-0-8.local ASSIGNED host A     2     Resource is used by two or more services
           blade-0-9.local ASSIGNED host A     2     Resource is used by two or more services
spare_pool blade-0-0.local ASSIGNED host A     1     Resource is used by two or more services
           blade-0-7.local ASSIGNED host A     1     Resource is used by two or more services


Thanks,
- Chansup

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=208953

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list