Opened 12 years ago

Last modified 9 years ago

#428 new defect

IZ2255: parallel jobs can exceed resource quotas if resources are allocated from multiple queues

Reported by: andreas Owned by:
Priority: normal Milestone:
Component: sge Version: 6.1
Severity: Keywords: scheduling
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2255]

        Issue #:      2255             Platform:     All      Reporter: andreas (andreas)
       Component:     gridengine          OS:        All
     Subcomponent:    scheduling       Version:      6.1         CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
          URL:
       * Summary:     parallel jobs can exceed resource quotas if resources are allocated from multiple queues
   Status whiteboard:
      Attachments:

     Issue 2255 blocks:
   Votes for issue 2255:


   Opened: Mon May 21 03:05:00 -0700 2007 
------------------------


DESCRIPTION:
A cluster global resource limit specified as resource quota limit

   limit      to  F001=1

can be exceeded by a parallel job

> qsub -pe intelmpi 2 -l F001=1 -b y /bin/sleep 3600

if there is more than a single cluster queue on a host that is suited for the
parallel environment:

> qconf -sq all.q | egrep "qname|pe_list"
qname                 all.q
pe_list               intelmpi
> qconf -sq second.q | egrep "qname|pe_list"
qname                 second.q
pe_list               intelmpi

pe_name           intelmpi
slots             100
user_lists        NONE
xuser_lists       NONE
start_proc_args   NONE
stop_proc_args    NONE
allocation_rule   $pe_slots
control_slaves    FALSE
job_is_first_task TRUE
urgency_slots     min


> qstat -g t
job-ID  prior   name       user         state submit/start at     queue
                 master ja-task-ID
------------------------------------------------------------------------------------------------------------------
    264 0.60500 sleep      ah114088     r     05/18/2007 12:39:14 all.q@ents
                 SLAVE
    264 0.60500 sleep      ah114088     r     05/18/2007 12:39:14 second.q@ents
                 MASTER

the problem exists also if the PE is added to the scope of the limit

   limit   pes intelmpi   to  F001=1

Change History (0)

Note: See TracTickets for help on using tickets.