Opened 15 years ago

Last modified 9 years ago

#201 new enhancement

IZ1273: consumable not working as suspend thresholds

Reported by: andy Owned by:
Priority: normal Milestone:
Component: sge Version: 6.0
Severity: Keywords: scheduling
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1273]

        Issue #:      1273             Platform:     All           Reporter: andy (andy)
       Component:     gridengine          OS:        All
     Subcomponent:    scheduling       Version:      6.0              CC:
                                                                             [_] reuti
                                                                             [_] uddeborg
                                                                             [_] Remove selected CCs
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    ENHANCEMENT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
          URL:
       * Summary:     consumable not working as suspend thresholds
   Status whiteboard:
      Attachments:

     Issue 1273 blocks:
   Votes for issue 1273:


   Opened: Tue Sep 14 03:42:00 -0700 2004 
------------------------


I came accross a weird behavior:

In 6.0 (and 5.3p6) it is supported to use
consumables as load thresholds, however it is not
working for suspend thresholds.

The makes the following setup impossible, I call
it "slot preemption":

Since Grid Engine does not allow to limit the
number of slots per host *and* use suspend on
subordinate together (if the host is full Grid
Engine cannot schedule a job to a full host, even
if it would suspend the "child" queue), the
following setup would implement an elegant
workaround for this problem:

1. a consumable attribute "nslots" is defined, on
the host level the total number of "nslots" is
defined, it typically would have the value of the
number of CPUs on that host.

All jobs but those running in the "low priority
queue" are requesting the "nslots" resource.

    qsub -l nslots=1 ...

Jobs submitted to the "low priority queue" are not
requesting the "nslots" setting, but the queue
woulkd be configured as follows:

    slots   <ncpus>
    load_thresholds    <whatever_is_required>
    suspend_threshold  nslots=0

In this setup a job which is started in the higher
priorities queue on that host would suspend the
loaw priority queue with theeffect that no new
jobs are started and running jobs are suspended.

The scheme works nicely for "load_thresholds",
however it does not work for "suspend_thresholds".

   ------- Additional comments from sgrell Thu Nov 4 03:24:18 -0700 2004 -------
The implementation should also take into account, that in one
scheduler run two jobs could be started, of which one would be
suspended right away because of the other one.
That is already an issue, but with a proper setting of load and supend
thesholds, it is very unlikly to happen.

We also want, that the suspend threshould is evaluated when ever a job
is dispatched.

Stephan

   ------- Additional comments from templedf Mon Dec 6 01:47:03 -0700 2004 -------
Looks easy enough to fix.  The question, though, it what the correct
behavior is.  Here's what the /id/ command does:

% id gidtest
uid=60003(gidtest) gid=60003

It simply ignored the name.  I don't think we have that option.
I would assume that if the gid can't be resolved into a name, the gid
should be the name, i.e. in the case above, "60003" would be stored as
the group name.  Another alternative would be to just name the group
"UNKNOWN".  My only issue with that option is how to tell two unknown
groups apart if they're both called "UNKNOWN".
Comments?  If no one voices an opinion by tomorrow, I will use the gid
as the group name.

   ------- Additional comments from templedf Mon Dec 6 01:48:09 -0700 2004 -------
Oops!  Wrong Issue!

   ------- Additional comments from sgrell Tue Dec 6 08:19:14 -0700 2005 -------
Changed the Subcomponent.

Stephan

   ------- Additional comments from sgrell Mon Dec 12 03:25:26 -0700 2005 -------
This desribes an RFE.

Stephan

   ------- Additional comments from reuti Thu Oct 23 02:45:50 -0700 2008 -------
adding myself as cc.

Change History (0)

Note: See TracTickets for help on using tickets.