[Hedeby users] Re: [GE users] SDM issues

rhierlmeier richard.hierlmeier at sun.com
Mon Jul 27 07:45:55 BST 2009


Hi Chansup,

On 07/24/09 21:56, cbyun wrote:
> Hi Richard and Torsten,
> 
> I have removed the resources from GE and spare_pool services.
> Now I see there is no alarm in the resource.  However, I am still having difficulties to register the hosts.
> 
> Is the dot (.) allowed in the hostname?
> My blade machines has their hostname as "blade-<number>-<number>.local"
> 
> See more details below:
> 
> # sdmadm sr
> service id              state    type flags usage annotation
> ------------------------------------------------------------
> power   blade-0-0.local ASSIGNED host       2
>         blade-0-1.local ASSIGNED host       2
>         blade-0-2.local ASSIGNED host       2
>         blade-0-3.local ASSIGNED host       2
>         blade-0-4.local ASSIGNED host       2
>         blade-0-5.local ASSIGNED host       2
>         blade-0-6.local ASSIGNED host       2
>         blade-0-7.local ASSIGNED host       2
>         blade-0-8.local ASSIGNED host       2
> 
> # sdmadm sslo
> service    slo                 quantity urgency request
> --------------------------------------------------------------------------------------------------
> gesvc2     fixed_usage         0        0       SLO has no needs
>            maxPendingJobs      0        0       SLO has no needs
> power      PermanentRequestSLO 10       2       type = "host" & owner = "power"
> spare_pool PermanentRequestSLO 5        1       type = "host"
> 
> 
> However, I am still getting the following error in the cs_vm-0.log (BTW, I am using simple installation option)
> 
> 07/24/2009 13:37:51|680|vice.impl.cloud.CloudSnapshot.checkCloudState|W|Service power:Problem: VPN server is corrupted! Registered but server-less resources: [[hostname: blade-0-0.local, instanceId: i-blade-0-0, launchTime: 2009-07-21T09:56:03.000Z] , ...
> 
> One thing I noticed is that, although the instanceID is supposed to be i-<hostname>, what is shown from the log is that it cuts out the ".local". It says: instanceId: i-blade-0-0
> 
> Is this an issue?

No I think this is not an issue as long as the long and short hostname resolve 
to the same ip-address.

> 
> I turned all hosts off. Also I stopped and restarted the power cloud service and got the following error:
> 
> 07/24/2009 14:45:34|703|.cloud.CloudServiceAdapterImpl.doStartService|I|Service power:Started cloud service adapter.
> 07/24/2009 14:45:35|704|.grm.util.EventListenerSupport$Worker.deliver|E|Event delivery problem: Timer already cancelled.
> 07/24/2009 14:49:35|705|vice.impl.cloud.CloudSnapshot.checkCloudState|W|Service power:The registered set of cloud host does not match the reported set! 

Registered mismatches [[hostname: blade-0-9.local, instanceId: i-blade-0-9, 
launchTime: 2009-07-21T09:56:03.000Z] , [hostname: blade-0-6.local, instanceId: 
i-blade-0-6, launchTime: 2009-07-21T09:56:03.000Z] ].
Reported mismatches   []
> 07/24/2009 14:49:35|705|vice.impl.cloud.CloudSnapshot.checkCloudState|W|Service power:Problem: VPN server is corrupted! Registered but server-less resources: 

[[hostname: blade-0-6.local, instanceId: i-blade-0-6, launchTime: 
2009-07-21T09:56:03.000Z] , [hostname: blade-0-9.local, instanceId: i-blade-0-9, 
launchTime: 2009-07-21T09:56:03.000Z] ].
> 07/24/2009 14:49:35|705|e.impl.cloud.CloudResourceAutoRecoverTask.run|W|Service power:Case NOT_REPORTED__NOT_REGISTERED__RESOURCE: This should not happen! The resource NOT_REPORTED__NOT_REGISTERED__RESOURCE does not seem to be a cloud resource at all. It is unknown to the cloud and not registered by the cloud service adapter! Please check your configuration!
> 

Stopping and restarting does not help. Please stop the SDM system, shutdown all 
hosts that should participate on power saving and cleanup the spool directory on 
the SDM master host:

# cd <local_spool_dir>/spool
# rm `find . -name "*.srf"`  power/cloud_hosts.spool

The *.srf files contain the definitions of the resources, in the 
cloud_hosts.spool file the cloud adapter stores the information what resource 
are reported from the cloud and what resource is the vpn server.

Afterwards you can restart the SDM system. If the min/max attribute of resource 
optimizer the power cloud adapter is set to 1 oneblade will be started, it is 
treaded as vpn server. As long as this host is kept alive the power saving cloud 
adapter should work.

The big problem is the vpn server. For the next release we must work on turning 
vpn off. The power saving use case works currently only if all hosts are 
shutdown before they are added to the SDM system. The power saving cloud adapter 
will treat the first started host a vpn server, it will never shutdown this host.


Richard


> And the sdmadm sr  shows:
> 
> # sdmadm sr
> service id              state type flags usage annotation                                                                                                  
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> power   blade-0-8.local ERROR host       2     Service power:Resource does not seem to be a cloud resource! It is unknown to the cloud and not registered by the cloud service adapter!
> 
> 



> Thanks,
> - Chansup
> 
> ------------------------------------------------------
> http://hedeby.sunsource.net/ds/viewMessage.do?dsForumId=160&dsMessageId=209388
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at hedeby.sunsource.net].


-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Richard Hierlmeier           Phone: ++49 (0)941 3075-223
Software Engineering         Fax:   ++49 (0)941 3075-222
Sun Microsystems GmbH
Dr.-Leo-Ritter-Str. 7	     mailto: richard.hierlmeier at sun.com
D-93049 Regensburg           http://www.sun.com/grid

Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=209674

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list