Opened 16 years ago

Last modified 9 years ago

#152 new feature

IZ920: Need a means to facilitate workflows in a scalable and better to handle fashion

Reported by: andreas Owned by:
Priority: normal Milestone:
Component: sge Version: current
Severity: Keywords: scheduling
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=920]

        Issue #:      920              Platform:     All       Reporter: andreas (andreas)
       Component:     gridengine          OS:        All
     Subcomponent:    scheduling       Version:      current      CC:
                                                                         [_] svdavidson
                                                                         [_] Remove selected CCs
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    FEATURE
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
          URL:
       * Summary:     Need a means to facilitate workflows in a scalable and better to handle fashion
   Status whiteboard:
      Attachments:
                      Date/filename:                                   Description:                                                                                Submitted by:
                      Fri Apr 30 07:34:00 -0700 2004: feature-spec.sxd An openoffice diagram of an array dependency pattern description (application/octet-stream) daireb

     Issue 920 blocks:
   Votes for issue 920:


   Opened: Thu Mar 25 04:21:00 -0700 2004 
------------------------


DESCRIPTION:
Grid Engine shall provide better means to
facilitate workflows in a scalable and better to
handle fashion. End users wish to run large
workflows comprising job numbers in the order of
magnitude of 35.000 single jobs. Grid Engine 6.0
support for job dependencies (-hold_jid) though
already gives a means to handle the dependencies
but with 35.000 jobs it takes long time to pump
all these jobs *individually* into Grid Engine. As
a matter of course treating such workflows in a
scalable fasion as a whole with operations such as
qdel/qhold and with qstat/qmon is also expected.
Current understanding is that request for CLI
support is primary but API support
might be beneficially as well. Having a means to
monitor workflow progress (GUI) certainly would be
nice.

WORKAROUND:
Current means to handle workflows are submit(1)
-hold_jid option and qmake(1).

HOWTOFIX:
Due to the scalability aspect it appears to be key
with that RFE to design it in a way allowing large
numbers of mostly individual but interdependent
jobs be grouped into a something that can be
sent to qmaster as a whole. End users possibly
would expect a single job id be returned by the
workflow submit operation that would then be
usable as handle (qmod/qdel/qstat). Also end users
might appreciate each job in a workflow be
monitorable and controllable as a workflow task.

   ------- Additional comments from andreas Fri Apr 30 04:55:40 -0700 2004 -------
It appears to be desirable to cover as much flexibility with
workflow tasks as possible: Ideally each task could have different
resource requirements, different command, different environment etc.
On the other hand it's important to gain savings to keep the overall
amount of data to be transfered to qmaster resp. the amount of data
kept in qmaster small.

To achieve both goals to the extend possible it might be useful
to design the interface for expressing each task in a way allowing
any attribute known from sequentail jobs be different with each task
and commonalities be used for memory footprint optimizing purpose
transparently behind the scenes.




   ------- Additional comments from andreas Fri Apr 30 04:58:34 -0700 2004 -------
Another aspect appears to be email notification:
There might be cases when notification is required on
a per workflow task bases. In addition to this there
is also a need for email notification for the workflow
in total.

   ------- Additional comments from daireb Fri Apr 30 07:34:18 -0700 2004 -------
Created an attachment (id=13)
An openoffice diagram of an array dependency pattern description

   ------- Additional comments from daireb Fri Apr 30 07:35:08 -0700 2004 -------
Well I thought I'd add an end user perspective to this issue as it's
particularly close to my heart! I evaluated SGE for 2 months and I
found a couple of problems with the software which were severe enough
to put doubt in my superiors minds and effectively end our interest in
SGE (we are looking to develop our own software now!).

The main problem was trying to bend SGE around our rendering pipeline.
We have an inhouse tool in which you can easily setup all the stages
of a render. An example might be: stage1 - render a "beauty" pass,
stage2 - render a "shadow" pass, stage3 - composite both stages
together. Now each of these stages comprises of a frame range whereby
each frame is a job in SGE. Currently the only way to submit this
correctly to SGE would be to submit every frame of every stage and
define the dependencies as you do. With 3 stages of 100 frames each
thats 300 jobs. We need to be able to do renders with many many
stages. We quickly get into thousands of frames/jobs. We also need to
be able to "clump" frames together (ie an SGE job renders more than
one frame sequentially). Finally we need to be able to render frame
ranges across the stages (e.g. I want frames 1-21 & 34-70 rendered).
Dependencies need to be very flexible to deal with these scenarios.

Array jobs would be ideal for us as frames are basically the same job
but with a slightly different command line (frame number /
$SGE_TASK_ID). Using the above example, SGE can currently only define
that all of stage 3 is dependent (-hold_jid) on all the frames/jobs of
stage 1 and stage 2. We require that frame 1 for stage 3 is rendered
as soon as possible.... then frame 2 etc etc.

So for our workflow we need the speed of submission of array jobs but
the freedom to define complicated per array job dependencies. We toyed
with the idea of defining dependencies between arrays as patterns but
found that it still didn't provide us with the freedom we required.
I've included a picture of our pattern description language - maybe
it'll give you guys a giggle!

The other problem with SGE that I found (there are many ways to work
around this one but no "elegant" solution) was its inability to
provide a way to alter priorities on jobs within a project only. In
other words a project hierarchal priority scheme. I would like to be
able to boost a job's priority at any time within it's project share
ONLY. This power can be given to project managers so they can manage
their own user's jobs and not effect other projects. This gives the
advantages of being able to use other project's idle cycles but not to
effect their overall share. The easiest solution I came up with was to
use an amount of override tickets (the amount would be hidden away in
some project supervisor's GUI somewhere) which was sufficient to
effect the inter-user tickets but nowhere near enough to disrupt the
inter-project tickets.

Anyway I've tried to summarise our workflow concerns and perhaps you
guys will find it useful. Sorry if its badly written - I had one to
many beers for lunch (hey its friday!).

   ------- Additional comments from sgrell Mon Dec 12 02:39:17 -0700 2005 -------
Changed subcomponent.

Stephan

Attachments (1)

13 (9.6 KB) - added by dlove 9 years ago.

Download all attachments as: .zip

Change History (1)

Changed 9 years ago by dlove

Note: See TracTickets for help on using tickets.