Opened 11 years ago

Last modified 9 years ago

#499 new defect

IZ2538: Using RQS makes scheduling by seqno to disregard some nodes

Reported by: reuti Owned by:
Priority: normal Milestone:
Component: sge Version: 6.1u3
Severity: Keywords: Macintosh scheduling
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2538]

        Issue #:      2538             Platform:     Macintosh   Reporter: reuti (reuti)
       Component:     gridengine          OS:        All
     Subcomponent:    scheduling       Version:      6.1u3          CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
          URL:
       * Summary:     Using RQS makes scheduling by seqno to disregard some nodes
   Status whiteboard:
      Attachments:

     Issue 2538 blocks:
   Votes for issue 2538:


   Opened: Wed Apr 2 08:26:00 -0700 2008 
------------------------


Prerequisite: SGE is set up with some feasible sequence numbers in the queue definition(s) and the
scheduler set queue_sort_method to seqno - working fine.

When in addtion at least one RQS is defined, some nodes might be disregarded completely in the
scheduling. This might even happen, if there is just one RQS for one node like:

{
   name         general
   description  Genreal SGE limits
   enabled      TRUE
   limit        name slots queues !cleaner.q hosts node22 to slots=8
}

and other nodes won't get jobs. If changing back and forth to:

 limit        name slots queues !cleaner.q hosts {*} to slots=8

sometimes they will be honored again.

OTOH: having waiting jobs in such a constellation and changing the queue_sort_method to load will
start them immediately.

The odd thing is, that it not happening all the time. Sometimes removing the RQS and adding it again
helps to get rid of it. Sorry for not having detailed information.

   ------- Additional comments from reuti Mon Sep 8 03:19:33 -0700 2008 -------
Seems that this happens only when the notation {*} is used for the hosts. If you list all by name it's
working (but with big clusters this means a long list - I only tried it on a small cluster with 8 nodes to get
it working).

Change History (0)

Note: See TracTickets for help on using tickets.