Opened 14 years ago

Last modified 8 years ago

#300 new defect

IZ1902: insufficient handling of job errors with large array jobs causes huge numbers of mails be sent

Reported by: andreas Owned by:
Priority: low Milestone:
Component: sge Version: 5.3
Severity: minor Keywords: qmaster
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1902]

        Issue #:      1902             Platform:     All      Reporter: andreas (andreas)
       Component:     gridengine          OS:        All
     Subcomponent:    qmaster          Version:      5.3         CC:
                                                                        [_] seb
                                                                        [_] Remove selected CCs
        Status:       NEW              Priority:     P4
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    ernst (ernst)
      QA Contact:     ernst
          URL:
       * Summary:     insufficient handling of job errors with large array jobs causes huge numbers of mails be sent
   Status whiteboard:
      Attachments:

     Issue 1902 blocks:
   Votes for issue 1902:


   Opened: Thu Nov 17 07:33:00 -0700 2005 
------------------------


DESCRIPTION:
When a large job array are submitted in a way that it's array jobs can't be started
this has no consequences for still pending array jobs of the same job array. As
a result
a submission command such as

  qsub -M andreas.haas@sun.com -m a -o /does/not/exit -t 1-10000 sleeper.sh

triggers delivery of 10000 error mails be sent to the submitter!

SUGGETED FIX:
Once the first array job of a job array failed in a way that the job error is
set, any other not yet dispatched array job must be set in error state as well.

   ------- Additional comments from andreas Thu Nov 17 08:41:13 -0700 2005 -------
Fixed summary typo

   ------- Additional comments from seb Fri Nov 18 02:14:36 -0700 2005 -------
add myself to cc list

   ------- Additional comments from ernst Mon Nov 28 00:36:02 -0700 2005 -------
changed summary

   ------- Additional comments from ernst Tue Dec 13 02:28:52 -0700 2005 -------
changed priority

Change History (1)

comment:1 Changed 8 years ago by dlove

  • Severity set to minor

#166 is duplicate but has a workaround

Note: See TracTickets for help on using tickets.