[GE users] Serial, then parallel, then serial - jobs
Andreas.Haas at Sun.COM
Andreas.Haas at Sun.COM
Fri Sep 7 11:15:57 BST 2007
On Fri, 7 Sep 2007, Iain Milne wrote:
>> -----Original Message-----
>> From: John Hearns [mailto:john.hearns at streamline-computing.com]
>> Sent: 07 September 2007 08:45
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Serial, then parallel, then serial - jobs
>> On Fri, 2007-09-07 at 08:17 +0100, Iain Milne wrote:
>>> Hi all,
>>> I have a situation where it would be preferable to be able
>> to run a job
>>> that allows me to do a single task first, then a set of
>> array job tasks,
>>> then a final single task to finish up. The onus being that the array
>>> jobs shouldn't start until the first task has finished, and
>>> then final task shouldn't start until the array jobs are done. The
>>> reason being that the array jobs require data that is only
>> created by
>>> the first job, and the final job has to collate the results from the
>>> array jobs back together.
>>> Is this possible within a single script (to qsub) or can it only be
>>> handled manually with three separate scripts and not
>> submitting the 2nd
>>> until the 1st is done, or the 3rd until the 2nd is complete?
>> The simple way to do this is via a job dependency.
>> For the second and third jobs:
>> qsub -hold_jid first-job-number(,second-job-number)
>> In the job submission script for the second and third jobs
>> you'll need a
>> bit log logic to parse the output from qstat and find those
>> job numbers.
>> the array_submitter.sh example script under $SGE_ROOT/examples/root
>> might be useful to you.
>> However, of the pre-processing and post-processing steps are
>> always the
>> same for every one of the array jobs, why not create a queue to run
>> these jobs in and create "prolog" and "epilog" scripts?
>> Looks like prolog and epilog are the neatest way for you -
>> and you only
>> use one qsub, as you want.
>> man queue_conf and look for 'prolog'
> For the prolog and epilog stuff, do these run each time for each job in
> the array, or just once in total? It's the latter case I'd be after, eg:
> break DNA alignment into [n] lengths, then run an array job with each
> sub-array tackling a separate part of the alignment; then finally
> running a task at the end to collate everything back together.
That you must model it with special job, since prolog/epilog are run
once for each job or array job task.
> Or another way:
> Run a
> Run b.i b.ii b.iii b.iv
> Run c
Actually with DRMAA there is also a programmatic interface that
could be used for this. If you think of it as a workflow you may find
useful suggestions in the DRMAA Ruby based flow processor.
Here is a sample flow:
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users