[GE users] SDM issues

rhierlmeier richard.hierlmeier at sun.com
Thu Jul 23 08:14:56 BST 2009


Hi Chansup,

On 07/22/09 22:42, cbyun wrote:
> Hi Rys,
> 
> Thanks for your help on issue 1.
> I was able to remove the gesvc service.
> 
> I was confused between the shutdown_component and shutdown_service.
> I had to use "shutdown_component", not shutdown_service.
> 


That's really confusing. Let me explain it. SDM is a component based system. It 
knows the following components:

   resource_provider, executor, reporter, CA (certificate authority) and services.

Yes services are also implemented as components. Each component has a 
configuration. So you have to modify the configuration of a service by invoking 
the sdmadm mod_component (shortcut mc), a sdmadm mod_service command is not 
available.

Service components have two different states. The component state (each 
component has a component state) and the service state (only for service).

The component state reflects the internal state of software component. The 
service state reflects the state of the real service (e.g. the state of the grid 
engine cluster). If the service state is UNKNOWN it means that the SDM system 
has no glue in what state the real service is. For a GE service it means that 
the connection to qmaster is not established.

If you stop a GE service with sdmadm shutdown_service (shortcut sds) the GE 
service adapter closes the connection of qmaster and the service state goes into 
  UNKNOWN. The same happens if qmaster goes down. The GE service adapter will 
detect the connection has been lost and will set the service state to UNKNOWN.

In contrast the sdmadm shutdown_component does more. First of all it can be 
invoked on all components (not only on service). Second it stops all active 
parts of the component.
The states of the components can be displayed with the sdmadm show_component 
command.
The sdmadm show_service command displays the component state and the service 
state of all services (service components).


You can only remove a component (or a service) from the system if the component 
state is stopped.


   Richard









> Thanks again,
> - Chansup
> 
> 
>> -----Original Message-----
>> From: Ryszard.Macidlowski at sun.com [mailto:Ryszard.Macidlowski at sun.com]
>> Sent: Wednesday, July 22, 2009 4:13 PM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] SDM issues
>>
>> Hi Chansup,
>>
>> I'll just asnwer to 1 question :) to question 2 you should get answer
>> from adapter experts :)
>>
>> cbyun pisze:
>>> I have set up grid engine and cloud services using the SGE 6.2u3 release.
>>> Now I have a GE service in unknown state and I don't know how to delete
>> it.
>>> Also, my cloud service is running but I see some issues from the log
>> file.
>>>
>>> Issue 1: How to delete the service in unknown state but started?
>>>
>> You need to stop the component. When service is in unknown state it
>> means it cannot be connacted by whaever reason that why you could stop
>> it and free resources.
>> To stop component just use
>>
>> sdmadm sdc -c gesvc
>>
>> And after that you should be able to remove service.
>>
>> BTW. I believe sdmadm remove_service should have also -force option to
>> remove running service.
>>
>> Rys
>>> # sdmadm ss
>>> host            service    cstate  sstate
>>> ------------------------------------------
>>> llgriddev.local gesvc      STARTED UNKNOWN
>>>                 gesvc2     STARTED RUNNING
>>>                 power      STARTED RUNNING
>>>                 spare_pool STARTED RUNNING
>>>
>>> # sdmadm sds -s gesvc -fr
>>> service result message
>>> -----------------------------------------------------
>>> gesvc   ERROR  Can not stop service, it is not active
>>> Error: Command has generated error.
>>>
>>> # sdmadm remove_service -s gesvc
>>> Error: Operation on component cannot be performed. Component in illegal
>> state: STARTED
>>> Issue 2: How to resolve the VPN server issue?
>>>
>>> I used the following config for cloud service:
>>>
>>>     <cloud_adapter:vpn xsi:type="cloud_adapter:OpenVPNConfig"
>>>                        vpnBinDir="/usr/sbin"
>>>                        vpnConfigDir="/tmp"
>>>                        vpnRequired="true"/>
>>>
>>> Which is taken from the Richard's blog:
>> http://blogs.sun.com/rhierlmeier/entry/using_sdm_cloud_adapter_to
>>> What does the following actually mean?
>>> Service power:Problem: VPN server is corrupted! Registered but server-
>> less resources:
>>>
>>> 07/22/2009
>> 13:37:22|48|vice.impl.cloud.CloudSnapshot.checkCloudState|W|Service
>> power:Problem: VPN server is corrupted! Registered but server-less
>> resources: [[hostname: blade-0-0.local, instanceId: i-blade-0-0,
>> launchTime: 2009-07-21T09:56:03.000Z] , [hostname: blade-0-1.local,
>> instanceId: i-blade-0-1, launchTime: 2009-07-21T09:56:03.000Z] ,
>> [hostname: blade-0-2.local, instanceId: i-blade-0-2, launchTime: 2009-07-
>> 21T09:56:03.000Z] , [hostname: blade-0-3.local, instanceId: i-blade-0-3,
>> launchTime: 2009-07-21T09:56:03.000Z] , [hostname: blade-0-4.local,
>> instanceId: i-blade-0-4, launchTime: 2009-07-21T09:56:03.000Z] ,
>> [hostname: blade-0-5.local, instanceId: i-blade-0-5, launchTime: 2009-07-
>> 21T09:56:03.000Z] , [hostname: blade-0-6.local, instanceId: i-blade-0-6,
>> launchTime: 2009-07-21T09:56:03.000Z] , [hostname: blade-0-7.local,
>> instanceId: i-blade-0-7, launchTime: 2009-07-21T09:56:03.000Z] ,
>> [hostname: blade-0-8.local, instanceId: i-blade-0-8, launchTime: 2009-07-
>> 21T09:56:03.000Z] , [hostname: blade-0-9.local, instanceId: i-blade-0-9,
>> launchTime: 2009-07-21T09:56:03.000Z] ].
>>> 07/22/2009
>> 13:37:22|48|ice.impl.cloud.CloudResourceAmountOptTask.run|I|Service
>> power:Service is in error recovery mode. Skipping resource amount
>> optimization cylce.
>>>
>>> # sdmadm sr
>>> service    id              state    type flags usage annotation
>>> ------------------------------------------------------------------------
>> ---------------------
>>> gesvc2     blade-0-1.local ASSIGNED host SA    1     Got execd update
>> event
>>>            blade-0-2.local ASSIGNED host SA    1     Got execd update
>> event
>>>            blade-0-3.local ASSIGNED host SA    1     Got execd update
>> event
>>>            blade-0-4.local ASSIGNED host SA    1     Got execd update
>> event
>>>            blade-0-5.local ASSIGNED host SA    1     Got execd update
>> event
>>>            blade-0-6.local ASSIGNED host SA    1     Got execd update
>> event
>>>            blade-0-8.local ASSIGNED host SA    1     Got execd update
>> event
>>>            blade-0-9.local ASSIGNED host SA    1     Got execd update
>> event
>>> power      blade-0-0.local ASSIGNED host A     2     Resource is used by
>> two or more services
>>>            blade-0-1.local ASSIGNED host A     2     Resource is used by
>> two or more services
>>>            blade-0-2.local ASSIGNED host A     2     Resource is used by
>> two or more services
>>>            blade-0-3.local ASSIGNED host A     2     Resource is used by
>> two or more services
>>>            blade-0-4.local ASSIGNED host A     2     Resource is used by
>> two or more services
>>>            blade-0-5.local ASSIGNED host A     2     Resource is used by
>> two or more services
>>>            blade-0-6.local ASSIGNED host A     2     Resource is used by
>> two or more services
>>>            blade-0-7.local ASSIGNED host A     2     Resource is used by
>> two or more services
>>>            blade-0-8.local ASSIGNED host A     2     Resource is used by
>> two or more services
>>>            blade-0-9.local ASSIGNED host A     2     Resource is used by
>> two or more services
>>> spare_pool blade-0-0.local ASSIGNED host A     1     Resource is used by
>> two or more services
>>>            blade-0-7.local ASSIGNED host A     1     Resource is used by
>> two or more services
>>>
>>> Thanks,
>>> - Chansup
>>>
>>> ------------------------------------------------------
>>>
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
>> =208953
>>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
>> =208990
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=208999
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Richard Hierlmeier           Phone: ++49 (0)941 3075-223
Software Engineering         Fax:   ++49 (0)941 3075-222
Sun Microsystems GmbH
Dr.-Leo-Ritter-Str. 7	     mailto: richard.hierlmeier at sun.com
D-93049 Regensburg           http://www.sun.com/grid

Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=209127

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list