Opened 10 years ago

Last modified 9 years ago

#717 new enhancement

IZ3134: When a job specified in another jobs -hold_jid failes with any non-zero exit code the dependent job should exit

Reported by: bdbaddog Owned by:
Priority: normal Milestone:
Component: sge Version: 6.2u2
Severity: Keywords: scheduling
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3134]

        Issue #:      3134             Platform:     All           Reporter: bdbaddog (bdbaddog)
       Component:     gridengine          OS:        All
     Subcomponent:    scheduling       Version:      6.2u2            CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    ENHANCEMENT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
          URL:
       * Summary:     When a job specified in another jobs -hold_jid failes with any non-zero exit code the dependent job should exit
   Status whiteboard:
      Attachments:

     Issue 3134 blocks:
   Votes for issue 3134:


   Opened: Fri Sep 18 15:08:00 -0700 2009 
------------------------


Greetings,

I would expect that if I have the following command lines:
qsub -P lp -cwd -N job_1 $fail_script

qsub -P lp -cwd -N job_2 -hold_jid job_1 $pass_script

And job_1 exits non-zero (and non exit code 100), that job_2 would exit as well, and not continue to run.

   ------- Additional comments from jeffbeadles Mon Sep 21 09:14:30 -0700 2009 -------
Please do NOT do this by default -- it'll break a lot of stuff.

For example, we submit thousands of jobs, with a single hold_jid running as a "caboose" of the job train.  The caboose job then gathers up
the status of all of the thousands of jobs, and generates a report.  It needs to run regardless of if the previous jobs exit codes.

A typical example of this is when submitting test cases to a grid, and then generating a report of the pass/fail tests.

  -Jeff

   ------- Additional comments from bdbaddog Mon Sep 21 09:26:55 -0700 2009 -------
O.k. how about another flag to indicate that the job is being held for successful completion of another job, and should be canceled if the
job it's waiting on fails?

-hold_succ_jid ?

-Bill

Change History (0)

Note: See TracTickets for help on using tickets.