Opened 11 years ago

Last modified 9 years ago

#613 new defect

IZ2838: enforce limit option might fail when execd for a slave or master parallel task is restarted

Reported by: crei Owned by:
Priority: normal Milestone:
Component: sge Version: 6.2u1
Severity: Keywords: kernel
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2838]

        Issue #:      2838             Platform:     All      Reporter: crei (crei)
       Component:     gridengine          OS:        All
     Subcomponent:    kernel           Version:      6.2u1       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    ernst (ernst)
      QA Contact:     andreas
          URL:
       * Summary:     enforce limit option might fail when execd for a slave or master parallel task is restarted
   Status whiteboard:
      Attachments:

     Issue 2838 blocks:
   Votes for issue 2838:


   Opened: Wed Dec 17 05:32:00 -0700 2008 
------------------------


Enforce limits settings might not result in deleting jobs when some of the tasks
of a parallel job run on a execd which was shutdown and restarted again.

This scenario is from testsuite test:

- setup queue for allhosts with h_rt limit (30 seconds)
- use qmaster_params
ENABLE_ENFORCE_MASTER_LIMIT=true,ENABLE_FORCED_QDEL_IF_UNKNOWN=true
- submit a tight integrated parallel job running longer than 30 seconds (e.g.
120 seconds)
- wait till all tasks are running
- shutdown all execds where parts of the pe jobs are running
- start a slave or master job execd again

The job should be terminated before the normal runtime has ended but this is not
the case

Change History (0)

Note: See TracTickets for help on using tickets.