[GE users] Serial, then parallel, then serial - jobs

John Hearns john.hearns at streamline-computing.com
Fri Sep 7 08:45:24 BST 2007

On Fri, 2007-09-07 at 08:17 +0100, Iain Milne wrote:
> Hi all,
> I have a situation where it would be preferable to be able to run a job
> that allows me to do a single task first, then a set of array job tasks,
> then a final single task to finish up. The onus being that the array
> jobs shouldn't start until the first task has finished, and similarly,
> then final task shouldn't start until the array jobs are done. The
> reason being that the array jobs require data that is only created by
> the first job, and the final job has to collate the results from the
> array jobs back together.
> Is this possible within a single script (to qsub) or can it only be
> handled manually with three separate scripts and not submitting the 2nd
> until the 1st is done, or the 3rd until the 2nd is complete?

The simple way to do this is via a job dependency.
For the second and third jobs:

qsub -hold_jid  first-job-number(,second-job-number)

In the job submission script for the second and third jobs you'll need a
bit log logic to parse the output from qstat and find those job numbers.
the array_submitter.sh  example script under $SGE_ROOT/examples/root
might be useful to you.

However, of the pre-processing and post-processing steps are always the
same for every one of the array jobs, why not create a queue to run
these jobs in and create "prolog" and "epilog" scripts?

Looks like prolog and epilog are the neatest way for you - and you only
use one qsub, as you want.

man queue_conf   and look for 'prolog'

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list