Opened 17 years ago
Last modified 10 years ago
#152 new feature
IZ920: Need a means to facilitate workflows in a scalable and better to handle fashion
Reported by: | andreas | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | sge | Version: | current |
Severity: | Keywords: | scheduling | |
Cc: |
Description
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=920]
Issue #: 920 Platform: All Reporter: andreas (andreas) Component: gridengine OS: All Subcomponent: scheduling Version: current CC: [_] svdavidson [_] Remove selected CCs Status: NEW Priority: P3 Resolution: Issue type: FEATURE Target milestone: --- Assigned to: andreas (andreas) QA Contact: andreas URL: * Summary: Need a means to facilitate workflows in a scalable and better to handle fashion Status whiteboard: Attachments: Date/filename: Description: Submitted by: Fri Apr 30 07:34:00 -0700 2004: feature-spec.sxd An openoffice diagram of an array dependency pattern description (application/octet-stream) daireb Issue 920 blocks: Votes for issue 920: Opened: Thu Mar 25 04:21:00 -0700 2004 ------------------------ DESCRIPTION: Grid Engine shall provide better means to facilitate workflows in a scalable and better to handle fashion. End users wish to run large workflows comprising job numbers in the order of magnitude of 35.000 single jobs. Grid Engine 6.0 support for job dependencies (-hold_jid) though already gives a means to handle the dependencies but with 35.000 jobs it takes long time to pump all these jobs *individually* into Grid Engine. As a matter of course treating such workflows in a scalable fasion as a whole with operations such as qdel/qhold and with qstat/qmon is also expected. Current understanding is that request for CLI support is primary but API support might be beneficially as well. Having a means to monitor workflow progress (GUI) certainly would be nice. WORKAROUND: Current means to handle workflows are submit(1) -hold_jid option and qmake(1). HOWTOFIX: Due to the scalability aspect it appears to be key with that RFE to design it in a way allowing large numbers of mostly individual but interdependent jobs be grouped into a something that can be sent to qmaster as a whole. End users possibly would expect a single job id be returned by the workflow submit operation that would then be usable as handle (qmod/qdel/qstat). Also end users might appreciate each job in a workflow be monitorable and controllable as a workflow task. ------- Additional comments from andreas Fri Apr 30 04:55:40 -0700 2004 ------- It appears to be desirable to cover as much flexibility with workflow tasks as possible: Ideally each task could have different resource requirements, different command, different environment etc. On the other hand it's important to gain savings to keep the overall amount of data to be transfered to qmaster resp. the amount of data kept in qmaster small. To achieve both goals to the extend possible it might be useful to design the interface for expressing each task in a way allowing any attribute known from sequentail jobs be different with each task and commonalities be used for memory footprint optimizing purpose transparently behind the scenes. ------- Additional comments from andreas Fri Apr 30 04:58:34 -0700 2004 ------- Another aspect appears to be email notification: There might be cases when notification is required on a per workflow task bases. In addition to this there is also a need for email notification for the workflow in total. ------- Additional comments from daireb Fri Apr 30 07:34:18 -0700 2004 ------- Created an attachment (id=13) An openoffice diagram of an array dependency pattern description ------- Additional comments from daireb Fri Apr 30 07:35:08 -0700 2004 ------- Well I thought I'd add an end user perspective to this issue as it's particularly close to my heart! I evaluated SGE for 2 months and I found a couple of problems with the software which were severe enough to put doubt in my superiors minds and effectively end our interest in SGE (we are looking to develop our own software now!). The main problem was trying to bend SGE around our rendering pipeline. We have an inhouse tool in which you can easily setup all the stages of a render. An example might be: stage1 - render a "beauty" pass, stage2 - render a "shadow" pass, stage3 - composite both stages together. Now each of these stages comprises of a frame range whereby each frame is a job in SGE. Currently the only way to submit this correctly to SGE would be to submit every frame of every stage and define the dependencies as you do. With 3 stages of 100 frames each thats 300 jobs. We need to be able to do renders with many many stages. We quickly get into thousands of frames/jobs. We also need to be able to "clump" frames together (ie an SGE job renders more than one frame sequentially). Finally we need to be able to render frame ranges across the stages (e.g. I want frames 1-21 & 34-70 rendered). Dependencies need to be very flexible to deal with these scenarios. Array jobs would be ideal for us as frames are basically the same job but with a slightly different command line (frame number / $SGE_TASK_ID). Using the above example, SGE can currently only define that all of stage 3 is dependent (-hold_jid) on all the frames/jobs of stage 1 and stage 2. We require that frame 1 for stage 3 is rendered as soon as possible.... then frame 2 etc etc. So for our workflow we need the speed of submission of array jobs but the freedom to define complicated per array job dependencies. We toyed with the idea of defining dependencies between arrays as patterns but found that it still didn't provide us with the freedom we required. I've included a picture of our pattern description language - maybe it'll give you guys a giggle! The other problem with SGE that I found (there are many ways to work around this one but no "elegant" solution) was its inability to provide a way to alter priorities on jobs within a project only. In other words a project hierarchal priority scheme. I would like to be able to boost a job's priority at any time within it's project share ONLY. This power can be given to project managers so they can manage their own user's jobs and not effect other projects. This gives the advantages of being able to use other project's idle cycles but not to effect their overall share. The easiest solution I came up with was to use an amount of override tickets (the amount would be hidden away in some project supervisor's GUI somewhere) which was sufficient to effect the inter-user tickets but nowhere near enough to disrupt the inter-project tickets. Anyway I've tried to summarise our workflow concerns and perhaps you guys will find it useful. Sorry if its badly written - I had one to many beers for lunch (hey its friday!). ------- Additional comments from sgrell Mon Dec 12 02:39:17 -0700 2005 ------- Changed subcomponent. Stephan
Attachments (1)
Note: See
TracTickets for help on using
tickets.