[GE users] Serial, then parallel, then serial - jobs

Iain Milne imilne at scri.ac.uk
Fri Sep 7 08:53:06 BST 2007


> -----Original Message-----
> From: John Hearns [mailto:john.hearns at streamline-computing.com] 
> Sent: 07 September 2007 08:45
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Serial, then parallel, then serial - jobs
> 
> On Fri, 2007-09-07 at 08:17 +0100, Iain Milne wrote:
> > Hi all,
> > 
> > I have a situation where it would be preferable to be able 
> to run a job
> > that allows me to do a single task first, then a set of 
> array job tasks,
> > then a final single task to finish up. The onus being that the array
> > jobs shouldn't start until the first task has finished, and 
> similarly,
> > then final task shouldn't start until the array jobs are done. The
> > reason being that the array jobs require data that is only 
> created by
> > the first job, and the final job has to collate the results from the
> > array jobs back together.
> > 
> > Is this possible within a single script (to qsub) or can it only be
> > handled manually with three separate scripts and not 
> submitting the 2nd
> > until the 1st is done, or the 3rd until the 2nd is complete?
> 
> The simple way to do this is via a job dependency.
> For the second and third jobs:
> 
> qsub -hold_jid  first-job-number(,second-job-number)
> 
> In the job submission script for the second and third jobs 
> you'll need a
> bit log logic to parse the output from qstat and find those 
> job numbers.
> the array_submitter.sh  example script under $SGE_ROOT/examples/root
> might be useful to you.
> 
> 
> 
> However, of the pre-processing and post-processing steps are 
> always the
> same for every one of the array jobs, why not create a queue to run
> these jobs in and create "prolog" and "epilog" scripts?
> 
> Looks like prolog and epilog are the neatest way for you - 
> and you only
> use one qsub, as you want.
> 
> man queue_conf   and look for 'prolog'

Interesting.

For the prolog and epilog stuff, do these run each time for each job in
the array, or just once in total? It's the latter case I'd be after, eg:
break DNA alignment into [n] lengths, then run an array job with each
sub-array tackling a separate part of the alignment; then finally
running a task at the end to collate everything back together.

Or another way:
  Run a
  Run b.i  b.ii  b.iii  b.iv
  Run c

Thanks

Iain
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list