Opened 14 years ago

Last modified 9 years ago

#345 new enhancement

IZ2036: Need two types of "disabled"

Reported by: saalwaechter Owned by:
Priority: normal Milestone:
Component: sge Version: 6.0u4
Severity: Keywords: qmaster
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2036]

        Issue #:      2036             Platform:     All           Reporter: saalwaechter (saalwaechter)
       Component:     gridengine          OS:        All
     Subcomponent:    qmaster          Version:      6.0u4            CC:
                                                                             [_] reuti
                                                                             [_] Remove selected CCs
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    ENHANCEMENT
                                   Target milestone: ---
      Assigned to:    sgrell (sgrell)
      QA Contact:     ernst
          URL:
       * Summary:     Need two types of "disabled"
   Status whiteboard:
      Attachments:

     Issue 2036 blocks:
   Votes for issue 2036:


   Opened: Mon Apr 24 08:26:00 -0700 2006 
------------------------


In our SGE environment we have many nodes, each with many queue instances. Our
grid administrators/designers often need to indefinitely disable certain queue
instances to test or implement specific advanced behaviors.  Our break-fix
team, which is a separate group, needs to disable queues on broken nodes, then
enable them after the repair is finished.

Right now those two separate use cases collide with each other. For example,
the admin team has disabled a set of queues, and wants them to stay disabled.
Then a node with one of those queues fails.  The break-fix team (re)disables
the queues on that node idempotently.  They fix the node.  But then people are
unsure if the repaired node's queues were disabled for the repair work, or for
design reasons.  Often the break-fix team just enables the queues, but then the
design configuration is corrupted.

Can we implement a two-level disable?  One level is "this queue is disabled
intentionally for design reasons".  The second level is "this queue is disabled
for maintenance reasons".  The two levels would be ANDed together to determine
the queue's real enabled/disabled state.  Either setting could be toggled
independently, but only if both settings were "enabled" would the queue be
enabled.

This approach would provide separate, independent controls for the two
independent use cases (design vs. maintenance).

   ------- Additional comments from reuti Thu May 18 12:41:09 -0700 2006 -------
We use user_lists for this purpose: so setting it to a user list sge_team_disabled or whatever you like will
disable the queue for other users.

   ------- Additional comments from saalwaechter Thu May 25 11:35:48 -0700 2006 -------
It's an interesting idea to use the user_lists.  That wouldn't work directly
for our environment because we already make use of user_lists for actual access
control.  We could remove the original user_lists entries for the affected
nodes and use the sge_team_disabled idea, but then someone would have to
remember to put the user_lists back to their original state.

Perhaps xuser_lists could be used.  The documentation doesn't state the
precedence order between user_lists and xuser_lists.  The downside of
xuser_lists is that one would have to be careful to make sure all users are
contained in the exclude list.

   ------- Additional comments from saalwaechter Fri Apr 6 12:21:33 -0700 2007 -------
Quick update. I couldn't use Reuti's user list suggestion for this
issue, because of the reasons already mentioned in this ticket.  In
the end I used a project instead.  I created one called "queue_disabler_proj",
which I then put in the project list of queues I want to disable.

This works because we don't otherwise use projects in our environment.
Note that a sneeky user could technically bypass this control by just
setting their job's project to queue_disabler_proj.  I could solve that
by putting a dummy user in the project's internal ACL, but I'm not that
worried about this potential at this point.

Finally, I do see that documentation does specify clearly the precedence
of user_lists vs. xuser_lists.  General design for this is that exclude
settings beat include settings in the case of conflicts.

   ------- Additional comments from saalwaechter Fri Apr 6 12:24:51 -0700 2007 -------
Oh, I'd still like to see two separate types of disabled, as originally
requested.  The "user_lists" and "projects" workarounds mentioned in this
ticket only work if either of these mechanisms are not already in use
for their intended purposes.

I was just fortunate that we weren't using projects, letting me repurpose
it for disabling queues.

   ------- Additional comments from ernst Thu Aug 21 02:17:21 -0700 2008 -------
Changed subcomponent

Change History (0)

Note: See TracTickets for help on using tickets.