Opened 11 years ago

Last modified 9 years ago

#583 new defect

IZ2761: Jobs not scheduled because of "-l NONE"

Reported by: reuti Owned by:
Priority: normal Milestone:
Component: sge Version: 6.2
Severity: Keywords: scheduling
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2761]

        Issue #:      2761             Platform:     All      Reporter: reuti (reuti)
       Component:     gridengine          OS:        All
     Subcomponent:    scheduling       Version:      6.2         CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
          URL:
       * Summary:     Jobs not scheduled because of "-l NONE"
   Status whiteboard:
      Attachments:

     Issue 2761 blocks:
   Votes for issue 2761:


   Opened: Wed Oct 22 06:57:00 -0700 2008 
------------------------


In a cluster setup with a consumable complex:

$ qconf -sc
#name               shortcut   type        relop requestable consumable default  urgency
#----------------------------------------------------------------------------------------
total_slots         ts         INT         <=    NO          YES        1        0

And attached to a node:

$ qconf -se pc15370
complex_values        total_slots=8

Serial jobs run fine, e.g. with two jobs as expected:

$ qhost -F total_slots
    Host Resource(s):      hc:total_slots=6.000000

But a parallel one will never be scheduled, because of the error:

$ qstat -j 99
parallel environment:  openmp range: 2
scheduling info:            (-l NONE) cannot run in queue "pc15370.xxx.xxx.xxx" because job requests non requestable resource total_slots"
                            cannot run in PE "openmp" because it only offers 0 slots

When I now make the consumable complex "total_slots" requestable (nothing more), the job starts immediately in the next scheduling interval with
proper output for the consumable "total_slots":

$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
     99 0.55500 test.sh    reuti        r     10/22/2008 15:49:55 all.q@pc15370.xxx.xxx     2
$ qhost -F total_slots
    Host Resource(s):      hc:total_slots=6.000000

(When someone would like to change the summary entry - I don't now how give this issue a short appropriate heading)

   ------- Additional comments from reuti Tue Jul 14 04:59:17 -0700 2009 -------
This behavior is still there in6.2u3, and AFAICS it's now even happening on serial jobs too (although there is no error message like "-l none"). With the option to set a complex to
"consumable job" the number of jobs can be limited by user now via an RQS using a complex like job_limit. But this job_limit has to be "requestable yes", otherwise neither serial, nor
parallel jobs will start. Having it requestable means a user could bypass it.

$ qconf -srqs user_limit
{
   name         user_limit
   description  Job limit per user
   enabled      TRUE
   limit        name serial_limit users {*} queues serial.q to job_limit=2
   limit        name parallel_limit users {*} queues parallel.q to job_limit=2
}

   ------- Additional comments from reuti Tue Oct 27 06:54:44 -0700 2009 -------
Changed summary.

Change History (0)

Note: See TracTickets for help on using tickets.