Opened 50 years ago

Last modified 9 years ago

#866 new task

IZ225: need 'sdmadm add/remove/modify_slo'

Reported by: rhierlmeier Owned by:
Priority: normal Milestone:
Component: hedeby Version: 1.0
Severity: Keywords: Sun cli
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=225]

        Issue #:      225          Platform:     Sun         Reporter: rhierlmeier (rhierlmeier)
       Component:     hedeby          OS:        All
     Subcomponent:    cli          Version:      1.0            CC:    None defined
        Status:       NEW          Priority:     P3
      Resolution:                 Issue type:    TASK
                               Target milestone: 1.0u5next
      Assigned to:    adoerr (adoerr)
      QA Contact:     adoerr
          URL:
       * Summary:     need 'sdmadm add/remove/modify_slo'
   Status whiteboard:
      Attachments:


     Issue 225 blocks:
   Votes for issue 225:     Vote for this issue


   Opened: Wed Nov 21 08:04:00 -0700 2007 
------------------------


   I think we should have the possibility to change the SLOs on the fly (without
   stopping/starting the service).
   In GEAdapter I implemented such a feature in the reload method. This method
   reads the new configuration, compares it with the old one and decides whether
   restarting of the service is necessary. If only SLOs has been changed the
   service is not restarted, only the SLOManager is reconfigured.
   With this approach it is possible to add/modify/remove SLOs on the fly with
   the mod_config and reload_compopnent command.


   However for implementing the add/modify/remove/show SLOs we need the following
   steps:

   - The service interface needs a new method setSLOs.
   - In hedeby-common.xsd we have to define a global slo element
   - The AbstractServiceConfig xml type has already the the slo element
     it's defined in hedeby-common.xsd
   - We have to implement the following cli commands

   sdmadm add_slo -s <service name> [-f <slo file>]
   sdmadm mod_slo -s <service name> -n <slo name>
   sdmadm remove_slo -s <service name> -n <slo name>
               ------- Additional comments from rhierlmeier Wed Nov 21 08:26:33 -0700 2007 -------
   type changed to task
               ------- Additional comments from crei Fri Apr 4 04:47:21 -0700 2008 -------
   Supporting this commands later
               ------- Additional comments from rhierlmeier Tue Aug 5 03:41:34 -0700 2008 -------
   The commands for modifying SLOs on the fly are very important. We should
   implement it in near future.
               ------- Additional comments from afisch Tue Aug 12 06:15:02 -0700 2008 -------

   Extending the CLI with dynamic modifySLO commands

   Description:
   SDM could benefit from a set of commands that allows to modify SLOs independent
   from the component config modification. Usually the setup of the components is
   done only once whereas the SLO modification might be a more frequent task during
   the lifetime of an SDM system. Typical reasons might be a changed use case for a
   managed service or the adaption of new services into a running SDM system.



   Evaluation:
   This issue is rated p3. It is not a mandatory but a handy feature, as a set of
   dedicated commands to modify SLOs would simlify the administration effort and
   the the SLO management would be separated from the service/component setup.

   A couple of remarks should be considered before implementing this feature:

   1.) The update of changed SLOs might be implemented dynamically, i.e without
   restarting the service. For the GE Adapter it is possible to update its
   configuration without explicit service shut down. This could be implemented for
   the update_SLO commando in a similar fashon. However during service shut down
   the managed domain may remain active. This is the case for a GE instance. This
   fact implies that the dynamic modification feature is not mandatory as long as
   it does not interrupt the managed service domain.

   2.) It should be clarified whether SLO modifications lead to implicit updates
   (aka reload) or not. If a update is done implicitly it would be sufficient to
   create/modify commands to show / add / remove / modify SLOS. Otherwise a fifth
   command to update is needed. From the usability point of view the behavior of
   the SLO modification should be consistent with other SDM commando sets. If a
   modify_SLO leads to an implicit update, the user might expect this behavior for
   other modification actions, too (e.g. modify component needs a explicit update).
   From this point of view a separate update would be reasonable.

   3.) It should be discussed if there are modify_SLO_scenarios with side effects
   that have to be considered. Here is one example: In a case where a set of
   services get their SLOs modified, there might be a time delay between the
   separate modifications. This time delay can lead to an imbalanced system. If for
   example the urgency of the services is raised/lowered it might make the
   resources migrate uncoordinated until all changes are applied. This point should
   be discussed with 2.) as a separate reload might avoid such problems especially
   if it could be done for a complete SDM system.


   Suggested Fix/Work Around:
   Currently the SLOs have to be edited by modifying the corresponding component
   configs.



   Analysis:
   In order to allow the separate modification of SLOs a set of commands has to be
   developed. Additionally man pages and wiki documentation for the commands have
   to be updated/created.
   As the service name is unique it is reasonable to consider serviceName:SloName
   as unique identifier for any SLO manipulation.

   1.) addSLO: This command allows to add a new slo to an existing service. It
   shows an editor with a default SLO XML-template.     a(dd_)slo -s(ervice)
   <Service name> -n(ame) <SLO name> -t(emplate) <SLO template>

   2.) removeSLO: This command allows to remove all SLOs/the SLO with the specified
   name from a service.
      r(emove_)slo -s(ervice) <Service name> [-n(ame) <SLO name>]

   3.) modifySLO: This command opens all SLOs /a singe SLO in VI to allow modification.
      m(odify_)slo -s(ervice)<Service name> [-n(ame) <SLO name>]

   4.) showSLO: shows all SLOs of a)the system, b)the service or lists c)a single SLO.
      s(how_)slo [-s(ervice)<Service name> [-n(ame) <SLO name>]]
   The show command exists but it should be extended with the following features:
   A -detail/-all flag and or a format option might be useful here (to list
   dependent resoucres, just the name etc).

   For the commands that can affect a set of slos (remove, show, update) it would
   be helpful to use a filter option to specify the set of SLOs.

       [-s(lo_)f(ilter) <e.g. 'type = "MaxPendingJobsSLO"'>]

   or to enumerate them explicitly (or simply allow this for the -n(ame) option)

       [-l(ist) slo1,slo2,...]

   The above commands should be implemented similar to the corresponding modify
   component/service config commands. But they should only modify the configuration
   subset that concerns the SLO definition.

   A command that should be considered separately is the updateSLO command:
      updateSLO: This command allows to update the SLOs of a)the system, b)a
   service. The system wide reload would allow a synchronized way to apply new
   resources (see Evaluation section). The command might be obsolete if a reload is
   included in commands 1.) - 4.)
      u(date_)slo [-s(ervice)<Service name> [-n(ame) <slo name>]]

   A good starting point to implement the functionality is the inner class
   ReloadAction:execute in the file
   com.sun.grid.grm.service.impl.ge.GEServiceImpl.java in :

               ...
              if(getServiceState().equals(ServiceState.RUNNING) ||
   getServiceState().equals(ServiceState.UNKNOWN)) {
                  // We have a valid configuration, check if a reconnect is necessary
                  if(config.isSameCluster(oldConfig) && jgdi.isConnected()) {
                      log.log(Level.INFO, "gsi.sameCluster", getName());
                      log.log(Level.FINE, "gsi.reinitSLOs", getName());
                      hostManager.stop(false);
                      hostManager.start();
                      sloManager.interrupt();
                      sloManager.setSLOs(config.getSLOs());

   sloManager.setUpdateInterval(config.getSloUpdateInterval().getValueInMillis());
                      sloManager.triggerUpdate();
                      setState(ComponentState.STARTED);
                                         // Issue 421: I can be that we have missed
   meanwhile a EXECD_DEL or
                      //            EXECD_ADD event. Trigger mergeResources manually
                      hostManager.mergeResources(jgdi.getExecHostList());
                                     } else {
                      log.log(Level.INFO, "gsi.newCluster", getName());
                      // We have a completly new cluster
                      // We really have to stop and restart this component
                      try {
                          StopAction stopAction = new StopAction();
                          stopAction.execute();
                          StartAction startAction = new StartAction();
                          startAction.execute();
                      } catch(GrmException ex) {
                          log.log(Level.WARNING, ex.getLocalizedMessage(), ex);
                          throw ex;
                      }
                  }
              } else {
                  // If the service is not running we reconfigure only the SLOManager
                  sloManager.setSLOs(config.getSLOs());
                  setState(ComponentState.STARTED);
              }
             The behavior can be outlined like this:

      if ServiceState Running/Unknown:
          if managed GE-system is valid:
              =>Restart Host manager (GE Adapter specific not SLO specific)
              =>Restart SLO manager
          else
              =>restart service/component
      else
          =>update sloManager

   This behavior allows a reload the component without shut down the corresponding
   service if possible.

   It would be reasonable to separate the SLO reload action from the general
   component reload action. This dedicated updateSLO method needs to be implemented
   by every Service (eg. the spare pool or any future service adapter). The
   functionality can not be fully implemented as a general function because details
   of the managed service domain have to be considered. Therefore it would be
   reasonable to extend the Service interface with a updateSLO() method as SLO
   reload is a service related task and not a general component task.

      com.sun.grid.grm.service.Service

   However this reload method would be similar to the one in the GrmComponent
   interface:

      /**
       * Triggers a reload of a SLO configuration
       *
       * @throws com.sun.grid.grm.GrmException when an error happend. It can
       * also be a ReloadSLOsNotSupportedException, when the Service does not
   support the
       * reload.
       */
      public void reloadSLOs() throws GrmException;

   A force option should not be neccessary.

   Finally the commands to modify the component config should still cover the SLO
   modification, as this is a handy way to configure the complete component in one
   step.



   How to test:
   There should be a set of JUnit test for each new command to check if each
   command can modify a DummySystem properly. For each command a Testsuite Test has
   to be developed to test the functionality on the command line level. Each
   command should be tested with different scenarios:

      a) add/remove/modify/show a set of SLO
      b) add/modify without saving the file to edit
      c) add/remove the same SLO twice to the same service/different services.
      d) add/remove/modify/reload with nonexistent service
      e) remove/modify/reload nonexistent SLO
      f) etc

   At this point it should be considered to cover missing "modify
   component/system"-commands with test cases, too. The corresponding tests would
   be very similar to these ones and should be therefore easy to implement.



   ETC:
   5 PD   design for the SLO management module as part of the Service SDK
   5 PD   Implementation of the SLO management module
   2 PD   Implementation of the UI classes
   2 PD   Implementation of CLI classes
   3 PD   testsuite infrastructure for SLO management
   2 PD   concrete testsuite tests
   1 PD   documentation (wiki, man pages ...)

   20 PD

               ------- Additional comments from rhierlmeier Wed Nov 25 07:21:10 -0700 2009 -------
   Milestone changed

Change History (0)

Note: See TracTickets for help on using tickets.